placeholder

Building an agent system: Documentation, standards, and setup

Agent orchestration works.

DEC. 17, 2025
10 Min Read
by
Adrian Obelmejias
I’ve shown the results and the workflow. But here’s what I haven’t told you yet: this approach lives or dies on documentation quality.
Not just any documentation. Not generic AI prompts. But high-quality engineering standards that serve both human developers AND AI agents.
If you’re reading this, you’re ready to build that foundation. Here’s exactly how.

The principle: Standards-driven development

The secret to effective AI agents isn’t better prompts or more powerful models. It’s a comprehensive documentation of your team’s patterns, decisions, and conventions.
Here’s the key insight: The documentation you should already have for human developers is exactly what AI agents need to work effectively.
Every well-run engineering team has (or should have):
  • Coding conventions
  • Architecture decision records
  • Testing strategies
  • Best practices documentation
The same documentation that helps onboard a junior developer helps bootstrap an AI agent. No duplicate effort. No separate “AI documentation.” Just clear, comprehensive standards that serve both audiences.

The two-layer documentation strategy

Our documentation lives in two places, each serving a distinct purpose:

Layer 1: /docs/ - Standards for Humans AND Agents

docs/
├── fastapi-standards.md       # Backend API patterns
├── react-standards.md         # React component patterns  
├── testing-standards.md       # Testing approaches
├── python-standards.md        # Python conventions
├── ts-standards.md           # TypeScript conventions
├── nextjs-standards.md       # Next.js specific patterns
└── nx-monorepo-standards.md  # Monorepo conventions
These are your team’s engineering standards. They contain:
  • Exact code patterns with correct and incorrect examples
  • The “why” behind decisions (not just the “what”)
  • Common pitfalls with solutions
  • Security and compliance requirements
  • Performance considerations

Layer 2: .agents/ - Agent-Specific Context

.agents/
├── profiles/
│   ├── backend-dev.md      # Role, decisions, when to use patterns
│   ├── frontend-dev.md     # React role, component decisions
│   ├── architect.md        # System design, cross-cutting concerns
│   ├── reviewer.md         # Code quality, security checks
│   └── tester.md          # Testing strategies, edge cases
├── context/
│   ├── codebase-overview.md
│   ├── conventions.md
│   └── dependencies.md
├── workflows/
│   ├── feature-development.md
│   ├── bug-fix.md
│   └── refactoring.md
└── memory/
    └── (stores approved architectural plans)
Agent profiles provide:
  • Role and responsibilities (what this agent focuses on)
  • References to standards (where to find the patterns)
  • Decision frameworks (when to use which approach)
  • Agent-specific guidance (how to apply patterns for this role)
The key difference: Agent profiles reference the standards in /docs/, they don’t duplicate them. This keeps everything in sync and avoids duplication.

What makes standards “agent-friendly”

Through experimentation, I’ve learned what makes documentation work well for AI agents:

1. Show correct AND incorrect examples

Agents learn from contrast. Always provide both:
# Correct - Complete endpoint with all required patterns
@router.post("/", status_code=status.HTTP_201_CREATED)
async def create_user(
    session: DbSessionDep,        # Multi-tenant context
    current_user: CurrentUserDep, # Authentication required
    user_data: UserCreate,        # Pydantic validation
) -> User:
    """
    Create a new user in the organization.
    
    Requires organization_admin role. Creates audit log entry
    for HIPAA compliance.
    """
    # Permission check (never forgotten)
    if not await has_permission(current_user, "create_user"):
        raise HTTPException(status_code=403)
    
    # Business logic with proper transaction handling
    async with session.transaction():
        user = User(
            created_by=current_user.id,
            **user_data.model_dump()
        )
        session.add(user)
        await session.flush()
        
        # Audit logging for HIPAA
        await create_audit_log(
            user_id=current_user.id,
            action="user_created",
            resource_id=user.id
        )
        
        # Background task for notifications
        user_id = user.id
        async def publish_event():
            await sns_client.publish(...)
        session.on_commit(publish_event)
    
    return user
# Wrong - Missing critical patterns
@app.post("/users/")
def create_user(user: User):
    db.add(user)
    db.commit()
    return user

2. Be explicit about “why”

Agents need context for decisions:
# Use DbSessionDep (not AsyncSession) because it includes
# multi-tenant context and automatically sets the correct
# database schema based on the authenticated user

3. Include edge cases and gotchas

Document the mistakes people make:
# Common mistake: Accessing user.id after session closes
# in background tasks. The session ends before the task runs.
# Solution: Capture primitive values before async tasks
user_id = user.id  # Capture the ID
async def send_notification():
    await notify(user_id)  # Use the captured value
session.on_commit(send_notification)

4. Structure with clear headers

Agents navigate structured docs efficiently:
## Core Principles
## When to Use This Pattern
## Common Pitfalls
## Examples
## Related Patterns

5. Reference, don’t duplicate

Agent profiles reference standards, don’t copy them:
# In agent profile:
"Follow the endpoint pattern in fastapi-standards.md section 3.2"
# Not:
"Here's the endpoint pattern: [500 lines of duplicated content]"

Real examples from our standards

Let me show you what effective standards look like. These work for both humans learning the codebase AND agents implementing features.

Example 1: FastAPI Endpoint Pattern

From docs/fastapi-standards.md:
# Correct - Complete endpoint with all required patterns
@router.post("/", status_code=status.HTTP_201_CREATED)
async def create_user(
    session: DbSessionDep,        # Multi-tenant context
    current_user: CurrentUserDep, # Authentication required
    user_data: UserCreate,        # Pydantic validation
) -> User:
    """
    Create a new user in the organization.
    
    Requires organization_admin role. Creates audit log entry
    for HIPAA compliance. Sends welcome email asynchronously.
    """
    # Permission check (never forgotten)
    if not await has_permission(current_user, "create_user"):
        raise HTTPException(status_code=403)
    
    # Business logic with proper transaction handling
    async with session.transaction():
        user = User(
            created_by=current_user.id,
            **user_data.model_dump()
        )
        session.add(user)
        await session.flush()
        
        # Audit logging for HIPAA
        await create_audit_log(
            user_id=current_user.id,
            action="user_created",
            resource_id=user.id
        )
        
        # Background task
        user_id = user.id
        async def publish_event():
            await sns_client.publish(...)
        session.on_commit(publish_event)
    
    return user
# Wrong - Missing critical patterns
@app.post("/users/")
def create_user(user: User):
    db.add(user)
    db.commit()
    return user
Why this works for agents: The agent sees the exact pattern, understands the “why” behind each line, and knows what mistakes to avoid.
Why this works for humans: New developers see the complete picture—not just syntax, but the architectural decisions and compliance requirements baked in.

Example 2: React component pattern

From docs/react-standards.md:
// Correct - Proper component structure
interface ButtonProps {
  children: React.ReactNode;
  variant?: "primary" | "secondary" | "outline";
  size?: "sm" | "md" | "lg";
  disabled?: boolean;
  onClick?: (event: React.MouseEvent<HTMLButtonElement>) => void;
}
export function Button({
  children,
  variant = "primary",
  size = "md",
  disabled = false,
  onClick,
}: ButtonProps): React.ReactElement {
  return (
    <button
      type="button"
      disabled={disabled}
      onClick={onClick}
      className={cn(
        "inline-flex items-center justify-center rounded-md font-medium",
        {
          "bg-primary text-white": variant === "primary",
          "bg-secondary text-secondary-foreground": variant === "secondary",
          "border border-input": variant === "outline",
        }
      )}
    >
      {children}
    </button>
  );
}
// Wrong - Untyped props, inconsistent naming
export function Button(props) {
  return <button className={props.class}>{props.text}</button>;
}

Example 3: Testing strategy

From docs/testing-standards.md:
# Business Logic Testing Focus
## What to test:
Calculation functions - capacity calculations, scoring algorithms
Permission systems - role-based access controls
Validation rules - business rule enforcement
State transitions - workflow state changes
Data transformations - data processing logic
## What NOT to test:
Framework internals - Don't test FastAPI or React behavior
Third-party libraries - Trust they're already tested
Simple getters/setters - Test behavior, not boilerplate
Database queries - Test the business logic, not ORM
## Test Structure:
async def test_user_creation_enforces_lead_access(
    db_session: DbSession,
    make_user,
    make_lead,
):
    """Test that users can only create records for leads they have access to."""
    # Arrange - Set up test data
    async with db_session.transaction():
        user = await make_user(db_session)
        lead = await make_lead(db_session, facility_id=user.facility_id)
    
    # Act - Perform the action
    result = await create_note(
        db_session, 
        user_id=user.id,
        lead_id=lead.id,
        content="Test note"
    )
    
    # Assert - Verify the outcome
    assert result is not None
    assert result.lead_id == lead.id
    assert result.created_by == user.id

Agent orofile architecture

Agent profiles are where you define roles, responsibilities, and decision-making frameworks. Here’s what goes into an effective profile:

What goes in a profile

1. Role and responsibilities
  • What this agent focuses on
  • What decisions it makes
  • What it doesn’t handle
2. References to standards
  • Direct links to relevant /docs/ files
  • Clear pointers: “See section 3.2 of fastapi-standards.md”
  • Never duplicate the actual patterns
3. Decision frameworks
  • When to use which pattern
  • How to choose between approaches
  • When to ask for guidance
4. Common pitfalls
  • Role-specific gotchas
  • Mistakes to avoid
  • How to handle edge cases
5. Compliance context
  • Why certain patterns exist (HIPAA, security, etc.)
  • What happens if rules are violated
  • Audit requirements

Example: Backend developer profile

Here’s a simplified example from .agents/profiles/backend-dev.md:
# Backend Developer Agent Profile
## Core Identity
You are a Backend Development Specialist for this healthcare platform.
## Key Standards (Read These First!)
Your implementation patterns live here - always follow them:
- [FastAPI Standards](../../docs/fastapi-standards.md) Read this first
- [Python Standards](../../docs/python-standards.md)
- [Testing Standards](../../docs/testing-standards.md)
- [Multi-Tenant Architecture](../../docs/multi-tenant-architecture.md)
These docs contain the exact code patterns, syntax, and examples you must follow.
## Your Role and Responsibilities
1. API Development: Implement endpoints following FastAPI standards
2. Multi-tenant Operations: Ensure proper tenant isolation via schema separation
3. HIPAA Compliance: Never expose PHI in logs or errors (see standards for examples)
4. Background Tasks: Use session.on_commit() pattern (see standards for implementation)
## Decision Framework
When should you add audit logging?
→ Always, for any operation that touches PHI (see Testing Standards for examples)
When should you use DbSessionDep vs AsyncSession?
→ Always use DbSessionDep - it includes multi-tenant context automatically
   (see FastAPI Standards section 4.2 for why)
When should you create a background task?
→ For operations that take >200ms or involve external services
   (see FastAPI Standards section 6.3 for the pattern)
## Common Pitfalls (Role-Specific)
Pitfall 1: Forgetting Multi-Tenant Context
Wrong: Using AsyncSession directly
Correct: Always use DbSessionDep
See: FastAPI Standards section 4.2
Pitfall 2: Session Access in Background Tasks  
Wrong: Accessing user.id after session closes
Correct: Capture primitive values before the task
See: FastAPI Standards section 6.3
Pitfall 3: Exposing PHI in Logs
Wrong: logger.info(f"User {user.name} logged in")
Correct: logger.info(f"User {user.id} logged in")
See: Python Standards section 8.1 - HIPAA Logging
## Compliance Requirements
HIPAA Audit Logging:
- Log all access to PHI (reads, writes, deletes)
- Include: user_id, action, resource_id, timestamp, IP address
- Never log: actual PHI content, passwords, tokens
Multi-Tenant Isolation:
- Database schema separation (automatic via DbSessionDep)
- Never query across schemas
- All queries scoped to current tenant automatically
## When to Ask for Guidance
- Novel architectural patterns not covered in standards
- Security implications unclear
- Performance tradeoffs requiring business input
- Compliance questions beyond documented patterns

Technical setup: Git worktrees and Docker isolation

Now let’s get into the practical setup for parallel development.

Git worktrees configuration

Git worktrees let you have multiple branches checked out simultaneously, each in its own directory:
# From your main repository
cd ~/projects/client-platform
# Create worktrees directory
mkdir -p ~/projects/client-worktrees
# Create a worktree for a new feature
git worktree add ~/projects/client-worktrees/feature-document-api -b feature/document-api
# Create worktree for a bug fix
git worktree add ~/projects/client-worktrees/fix-users-api -b fix/users-api-joins
# Your directory structure now looks like:
# ~/projects/client-platform          (main repo)
# ~/projects/client-worktrees/
#   ├── feature-document-api/            (separate branch)
#   ├── fix-users-api/                   (separate branch)
#   └── feature-contact-confirm/         (separate branch)

Docker isolation per worktree

Each worktree needs isolated Docker containers. Use COMPOSE_PROJECT_NAME:
# In each worktree, set a unique project name
cd ~/projects/client-worktrees/feature-document-api
export COMPOSE_PROJECT_NAME="agent-document-api"
docker compose up -d
# Different worktree, different containers
cd ~/projects/client-worktrees/fix-users-api
export COMPOSE_PROJECT_NAME="agent-users-api"
docker compose up -d
This creates completely isolated environments:
  • Separate database containers
  • Separate Redis instances
  • Separate service ports
  • No conflicts, no contamination

Context injection: How to initialize an agent

When starting work in a worktree, you inject context into your AI agent:
cd ~/worktrees/feature-new-api
# Feed the agent its role and context
ROLE: @.agents/profiles/backend-dev.md
CONTEXT:
- @.agents/context/codebase-overview.md
- @.agents/context/conventions.md
- @.agents/context/dependencies.md
TASK: Implement user preferences API
- Endpoints: GET/PUT /v1/users/{id}/preferences
- Storage: JSONB column for flexibility
- Validation: Max 10KB preferences size
Let's start with the database schema.
The @ syntax tells your AI tool (Cursor, Claude Code, etc.) to read and inject the file contents.

Getting started: 4-week plan

Don’t try to build everything at once. Start small and iterate.

Week 1: Single agent, single worktree

Goal: Get comfortable with the guide/review cycle.
  1. Choose your model: Start with Claude Sonnet 4.5 (no extended thinking) for implementation tasks
  2. Document one pattern: Pick your most common task (e.g., “How we write API endpoints”) and document it completely with correct/incorrect examples
  3. Create your first agent profile: Backend or frontend developer, 50-100 lines
  4. Take one feature: Use a single agent in one worktree
  5. Practice guiding: Focus on reviewing and guiding, not typing
Success criteria: You complete one feature using agent orchestration and feel comfortable with the review cycle.

Week 2: Two agents in parallel

Goal: Experience parallel development.
  1. Enhance your agent profile: Add lessons learned from Week 1
  2. Document another pattern: Add testing or component patterns
  3. Take a feature with backend + frontend: Run two agents in parallel worktrees
  4. Practice context switching: Get comfortable switching between reviews
Success criteria: You complete a feature that touches both backend and frontend, managing two agents simultaneously.

Week 3: Full orchestration

Goal: Work on 3-4 features simultaneously.
  1. Add specialized agents: Create architect and reviewer profiles
  2. Document more patterns: Fill out your standards library
  3. Use the Architect → Plan → Implement pattern: For complex features, architect first
  4. Work on 3-4 features simultaneously: Mix of complexities
  5. Refine workflows: Adjust based on what works
Success criteria: You ship 3+ features in a day using orchestration and feel the cognitive shift.

Week 4: Team rollout

Goal: Scale to your team.
  1. Document your workflow: Write a team guide based on your experience
  2. Share your agent profiles: Let others use and improve them
  3. Review and refine standards: Make them work for the whole team
  4. Onboard one team member: Help someone else start with orchestration
Success criteria: At least one other person on your team is successfully using agent orchestration.

The maintenance strategy

Here’s how we keep everything in sync:
Standards evolve in /docs:
  • PR introduces a new pattern? Update the standard.
  • Architecture decision made? Document it.
  • Security issue discovered? Add to the standards.
Agent profiles stay stable:
  • They reference standards, not duplicate them
  • Changes to patterns don’t require updating agent profiles
  • Agents always see the latest patterns via references
The principle: Standards are the source of truth. Agent profiles are the lens through which agents view those standards.

Key lessons learned

After several weeks of working this way, here’s what I’ve learned:

1. Always architect complex features first

Don’t let a dev agent jump straight into implementation on non-trivial features. Use the Architect → Plan → Review → Implement workflow. The 20 minutes spent planning saves hours of refactoring.
Store these plans in .agents/memory/ so the dev agent has a roadmap. And critically: use your most powerful model (Opus or Sonnet 4.5 with extended thinking) for the architect—this is where complex reasoning matters most.

2. The bottleneck shifts to you

Your ability to context switch between reviews becomes the limiting factor. This is a good problem to have.

3. Quality improves

Continuous real-time review catches issues immediately. No more “I’ll review it later” backlog. You catch architectural issues (like missing permission checks) that agents might miss.

4. Consistency is automatic

Agents follow patterns perfectly. No more style drift across the codebase.

5. Context is everything

The more context you provide to agents (ticket details, acceptance criteria, QA feedback), the better their output. Tools like MCP servers can help automate this.

6. Documentation investment pays double

Every standard you write helps both human developers AND AI agents. No duplicate effort.

7. Start with what you have

You don’t need perfect documentation to start. Document one pattern, try it with one agent, and iterate based on mistakes.

8. The tool doesn’t matter—the system does

I use Cursor CLI for this workflow, but the principles work with any AI coding tool (Claude Code, GitHub Copilot, etc.). What matters is: specialized agent profiles, git worktrees for isolation, proper context injection, and the orchestration mindset.
Pick the tool that fits your workflow, but focus on building the system.

9. You can actually be a tech lead AND ship code

This is huge. While agents work on implementation, I can review PRs, answer Slack questions, groom the backlog, and handle architectural decisions—all without losing my place. The agents keep working while I context-switch to leadership responsibilities. When I return, I just review their progress.
No mental model to rebuild. No “where was I?” moment.

Real impact: Before and after

Let me show you the concrete difference documentation makes:
Before (generic AI assistant):
Me: "Create a user endpoint"
AI: Creates basic CRUD without auth, tenant isolation, or audit logging
Me: "Add authentication"
AI: Adds auth but forgets tenant isolation
Me: "Add multi-tenant support"
AI: Adds tenant check but uses wrong dependency
Me: "Add audit logging"
AI: Adds logging but exposes PHI
[After 10 iterations, still not production-ready]
After (agent with standards):
Me: "Create a user endpoint following our standards"
Agent: Reads fastapi-standards.md
Agent: Implements complete endpoint with:
       - Proper dependency injection (DbSessionDep, CurrentUserDep)
       - Multi-tenant isolation via schema
       - Permission checks
       - HIPAA-compliant audit logging
       - Background task pattern
       - Comprehensive tests
       - Production-ready on first iteration
The difference isn’t the AI model—it’s the documentation.

Start small, build momentum

You don’t need perfect documentation to start. Begin with:
  1. Document one pattern completely (e.g., “How we write API endpoints”)
  2. Show correct and incorrect examples
  3. Explain the “why” behind decisions
  4. Create your first agent profile that references it
  5. Try one feature with one agent
  6. Iterate based on what the agent gets wrong
Every time an agent makes a mistake, ask: “Is this documented in our standards?” If not, add it. Over time, your standards become comprehensive and your agents become more accurate.
The documentation you build for agents makes your team better—whether those team members are human or AI.