The Agentic Developer's Playbook

From first prompt to production agent teams — master agentic software engineering with Claude Code, Codex CLI, and beyond.

What This Book Is

This is a structured, practice-first curriculum for developers who want to go beyond basic AI coding tool usage and master agentic software engineering — the discipline of building software with AI agents as collaborators, not just autocomplete.

You will learn:

  • Mental models — how to think about AI agents as development partners
  • Progressive skills — from your first session to orchestrating agent teams
  • Cross-tool patterns — principles that work in Claude Code, Codex CLI, and future tools
  • Battle-tested workflows — patterns rated by maturity from real practitioner experience

Who This Is For

You are...Start here
New to AI coding CLIsWhat Is Agentic Development?
Using Claude Code / Codex CLI at a basic levelProject Memory
Comfortable but want advanced patternsSub-Agents
Looking for a specific patternWorkflow Patterns
Debugging an agent issueTroubleshooting
Wanting to see what NOT to doAnti-Patterns

How to Read This Book

Three ways to use this resource:

1. Book Mode (sequential)

Read the curriculum from Foundation through Advanced. Each module builds on the previous. This is the best approach if you're learning these tools for the first time or want to fill gaps systematically.

2. Reference Mode (random access)

Jump directly to the Reference section for specific lookups — patterns, anti-patterns, cheatsheets, configs, prompts, and troubleshooting guides. Best for developers who already use these tools and need a specific answer.

3. Project Mode (hands-on)

Go straight to the Practice section and learn by building. The CLI Todo App project walks you through agent-assisted development from scratch. Case studies show how real workflows come together.

Time Estimates

TierModulesEstimated Time
FoundationWhat Is Agentic Dev, First Hour, Project Memory, Prompting~3 hours
IntermediateHooks & Commands, Sub-Agents, MCP Servers, Session Architecture~4 hours
AdvancedAgent Teams, Headless/CI, Orchestration, Team Adoption~5 hours
ReferenceBrowse as needed
PracticeCLI Todo App + Case Studies~3 hours

Suggested Learning Paths

The Weekend Sprint — Modules 00-03 (~3 hours) Get from zero to productive. You'll understand the mental model, have a working setup, master project memory, and know how to prompt effectively.

The Practitioner Path — Modules 00-07 (~7 hours) Everything in the Weekend Sprint plus hooks, sub-agents, MCP servers, and session architecture. You'll be an intermediate-to-advanced practitioner.

The Full Journey — All modules (~12 hours) Foundation through Advanced, plus reference browsing and the practice project. You'll understand the full landscape of agentic software engineering.

Each Module Contains

Every curriculum module has four parts:

  • Concepts — tool-agnostic principles and mental models (~15-20 min read)
  • Claude Code — implementation specifics for Claude Code (~15-20 min)
  • Codex CLI — implementation specifics for Codex CLI (~15-20 min)
  • Exercises — hands-on practice on your own projects (~20-30 min)

You can read just the Concepts page for the principles, or dive into the tool-specific pages for hands-on implementation.

Pattern Maturity Levels

Content in this book is rated by how well-tested it is:

LevelMeaning
ExperimentalTried by 1-2 people, promising but unvalidated
ProvenUsed successfully by multiple practitioners
Battle-TestedValidated in production by teams, edge cases documented

Contributing

This is an open-source project. See the Contributing appendix for how to improve this book, add patterns, or share your own case studies.


title: What Is Agentic Development? last_updated: 2026-03-21 status: proven difficulty: beginner prerequisites: []

What Is Agentic Development?

The Core Question: "How Is This Different from Autocomplete?"

You have probably used code completion tools. You type a few characters, a gray suggestion appears, you press Tab, and you move on. Maybe you have tried a chat interface where you paste code and ask a question. Those are useful, but they are not what this module is about.

Agentic development is a fundamentally different way of working with AI. The difference is not speed or accuracy of suggestions — it is about who is driving. In autocomplete, you drive and the AI offers passive suggestions. In agentic development, you state your intent and the AI drives toward it, reading files, running commands, editing code, and course-correcting along the way. You shift from typist to director.

That distinction matters because it changes what you can accomplish in a given unit of time, and it changes which tasks are worth delegating at all.

The Mental Model Shift: Collaborators, Not Suggestions

The most important thing to internalize early is this: an AI agent is not a smarter autocomplete. It is closer to a junior developer sitting at a terminal next to you. It can read your codebase, run your tests, look at error output, and try again. It maintains context across a multi-step task. It makes mistakes, and it can often fix them when you point them out.

This means your relationship with the tool changes. You stop thinking "what should I type next?" and start thinking "what do I want to be true when this task is done?" You provide the intent, the constraints, and the judgment calls. The agent provides the exploration, the mechanical execution, and the patience to try multiple approaches.

If you have ever delegated a task to a teammate by writing a clear description of what you want, reviewing their pull request, and giving feedback — you already know the workflow. Agentic development is that loop, compressed into minutes instead of hours.

Three Modes of AI-Assisted Development

It helps to have a clear taxonomy. There are three distinct modes of working with AI as a developer, and they are not interchangeable.

1. Completion (Copilot-style) You write code. The AI predicts the next few tokens or lines. You accept, reject, or modify. The AI has limited context — usually the current file and maybe a few open tabs. This is reactive and line-level. It shines for boilerplate, repetitive patterns, and finishing thoughts you have already started.

2. Chat (Ask questions) You copy-paste code or describe a problem. The AI responds with an explanation, a code snippet, or a suggestion. Context is limited to what you explicitly provide in the conversation. This is useful for learning, debugging specific errors, and getting unstuck. But you are still the one doing all the mechanical work — opening files, running commands, making edits.

3. Agentic (Autonomous multi-step tasks) You describe a goal. The agent decomposes it into steps, reads relevant files from your codebase, writes or edits code, runs commands to verify its work, and iterates. It maintains context across the entire task. You review the result and steer when needed. This is the mode we focus on in this curriculum.

These modes are not ranked. Autocomplete is perfect for finishing a line of code. Chat is perfect for asking "why does this regex not match?" Agentic is perfect for "add input validation to all the API endpoints in this service." Knowing which mode fits which task is a core skill, and one of the exercises at the end of this module asks you to practice exactly that.

What Makes Something "Agentic"

A tool qualifies as agentic when it can autonomously perform multiple steps toward a goal. Specifically, an agentic coding tool can:

  • Read files from your project without you copy-pasting them in.
  • Run commands — tests, linters, build tools, shell commands — and observe the output.
  • Make edits to existing files or create new ones.
  • Maintain context across a conversation, remembering what it read, what it tried, and what failed.
  • Decompose tasks into sub-steps, deciding what to do next based on what it has learned so far.

The combination is what matters. A tool that can read files but not run commands is a search engine. A tool that can edit files but not verify the result is a blind typist. The agentic loop — plan, act, observe, adjust — is what produces reliable results on non-trivial tasks.

The Leverage Thesis

Agentic development is not just a speedup. It is a multiplier. The distinction matters.

A speedup means you do the same work faster. If adding a feature takes 2 hours and the tool cuts it to 1 hour, that is a speedup. Useful, but linear.

A multiplier means you can take on tasks you would not have attempted before. Refactoring a codebase to use a new pattern across 50 files. Investigating a bug by reading through 20 source files and correlating behavior. Writing comprehensive tests for an under-tested module. These are tasks where the bottleneck was never your typing speed — it was the sheer volume of reading, context-switching, and mechanical editing. An agent collapses that overhead.

This also means agentic development changes your economics. Tasks that were "not worth the time" become worth it. Technical debt that you would have lived with for months becomes something you can address in an afternoon. The ceiling on what a single developer can accomplish in a day shifts upward.

The Collaboration Model

The most productive framing is a division of labor:

You provide:

  • Intent — what you want to accomplish and why.
  • Judgment — is this approach correct? Is the result acceptable? Does it match the project's conventions?
  • Constraints — what not to touch, what patterns to follow, what edge cases matter.
  • Domain knowledge — business logic, user requirements, architectural decisions that are not in the code.

The agent provides:

  • Exploration — reading files, understanding structure, finding relevant code.
  • Execution — writing code, running commands, making edits across files.
  • Iteration — trying an approach, checking the result, adjusting if it fails.
  • Patience — doing tedious, repetitive work without fatigue or shortcuts.

This split means your job shifts toward higher-level thinking. You spend more time on "what should we build?" and "is this correct?" and less time on "where is that file?" and "let me update these 15 import statements."

When Agentic Development Works Well

Agentic tools excel when:

  • The task involves reading and modifying multiple files.
  • The task is well-defined enough that you can describe the desired outcome.
  • The codebase has reasonable structure — the agent can navigate it.
  • Verification is possible — you can run tests, check output, or review diffs.
  • The task is mechanical or exploratory — not deeply novel research.

Examples of strong fits: adding a new API endpoint following existing patterns, writing tests for existing code, refactoring a module, investigating a bug by tracing through code, updating configuration across files, explaining unfamiliar code.

When Agentic Development Does Not Work Well

Be honest about the limitations:

  • Highly ambiguous tasks where you cannot articulate what you want. The agent will produce something, but it may not be what you need, and the back-and-forth may cost more time than doing it yourself.
  • Tasks requiring deep domain expertise that is not in the codebase. The agent can only work with what it can read.
  • Security-critical code where you need to personally verify every line. The agent is a tool, not a substitute for your judgment.
  • Novel architecture decisions — choosing between approaches, designing systems from scratch. The agent can help explore options, but the decision is yours.
  • Tasks with no feedback loop — if there is no way to verify the result (no tests, no compiler, no observable behavior), the agent is working blind.

The pattern is straightforward: the more clearly you can describe "done" and the more mechanically verifiable the result is, the better an agent performs. The more judgment, taste, and ambiguity involved, the more you need to stay in the driver's seat.

Key Takeaways

  • Agentic development is a different mode of working, not a better autocomplete. The agent reads your code, runs commands, makes edits, and iterates — you provide direction and judgment.
  • There are three modes of AI-assisted development — completion, chat, and agentic — and knowing which to reach for is a core skill.
  • The leverage is multiplicative, not additive. You do not just do the same work faster; you take on work you would not have attempted.
  • Your job shifts from execution to direction. You describe intent, set constraints, and review results. The agent handles exploration and mechanical work.
  • Agentic tools have clear strengths and weaknesses. They excel at well-defined, multi-step, verifiable tasks. They struggle with ambiguity, novel design, and work that requires judgment only you can provide.

title: "What Is Agentic Development? — Claude Code" tested_with: claude-code: "1.0.x" last_updated: 2026-03-21 status: proven difficulty: beginner prerequisites: []

What Is Agentic Development? — Claude Code

What Claude Code Is

Claude Code is Anthropic's terminal-based agentic coding tool. You run it in your terminal, inside your project directory, and it operates directly in your development environment. It reads your files, edits your code, runs your commands, and works with your actual tools — not a copy, not a sandbox, not a browser-based approximation.

This is the key distinction from web-based chat interfaces. Claude Code does not ask you to paste code into a text box. It reads your codebase itself. It does not give you a snippet to copy back — it edits the file directly. It does not tell you to run a command — it runs the command and reads the output. You stay in your terminal, in your repo, with your git history and your tools.

The Architecture

Understanding the architecture helps you predict what Claude Code can and cannot do.

Your Terminal
    |
    v
Claude Code (local CLI process)
    |
    v
Claude model (Anthropic API)
    |
    v
Tools: File Read | File Write/Edit | Bash | Search | Sub-agents

When you type a message, Claude Code sends it — along with relevant context about your project — to the Claude model via Anthropic's API. The model decides what to do: read a file, search the codebase, run a command, or make an edit. Claude Code executes that action locally on your machine, sends the result back to the model, and the loop continues until the task is complete or the model needs your input.

Everything runs locally on your machine except the model inference. Your code is sent to the API for processing, but the file operations, command execution, and edits all happen in your local environment. Your project files live on your disk, not in a cloud workspace.

Key Capabilities

File reading. Claude Code can read any file in your project. It uses this to understand your codebase — reading source files, configuration, tests, documentation, and package manifests. When you ask a question about your code, it reads the relevant files rather than guessing.

File editing. Claude Code makes targeted edits to files. It performs search-and-replace operations on specific sections of code, which means it can modify a function without rewriting the entire file. You see a diff of every change before it is applied (depending on your permission settings).

Search. Claude Code can search across your codebase using pattern matching — finding function definitions, usages, configuration values, or any text pattern. This is how it navigates unfamiliar codebases and traces through code.

Bash execution. Claude Code can run shell commands in your terminal environment. This includes running tests, installing dependencies, checking git status, running linters, building projects, and anything else you would type at a command line. This is what closes the feedback loop — the agent can verify its own work.

Sub-agents. For complex tasks, Claude Code can spawn sub-agents — lightweight, focused instances that handle a specific part of a larger task. For example, the main agent might use a sub-agent to research a question before proceeding with an implementation. This happens automatically; you do not need to configure it.

MCP (Model Context Protocol). Claude Code supports MCP servers, which are plugins that extend its capabilities. MCP servers can provide access to databases, APIs, documentation, and other external tools. If your team has custom tooling, MCP is how you connect it.

Hooks. Claude Code supports hooks — scripts that run automatically at specific points in the workflow. For example, you can configure a hook that runs your linter after every file edit, or a hook that formats code before it is written. Hooks let you enforce project standards automatically.

A Typical Session

Here is what a real session looks like, step by step.

  1. You open your terminal in a project directory and run claude.
  2. You describe what you want. For example: "Add input validation to the createUser endpoint. Reject requests where the email field is missing or not a valid email format."
  3. The agent plans. Claude Code reads the relevant files — the route handler, the existing validation logic, the tests. It may search for how validation is done elsewhere in the project to match existing patterns.
  4. The agent acts. It edits the route handler to add validation. It may update or create tests. It runs the test suite to verify nothing broke.
  5. You review. You see the diffs, the command output, and the agent's reasoning. You approve, request changes, or ask follow-up questions.
  6. The loop continues until the task is done. If tests fail, the agent reads the error output and fixes the issue. If you ask for a change, it adjusts.

The critical point: this is an iterative loop, not a one-shot generation. The agent's ability to observe the results of its actions and adjust is what makes it agentic rather than a fancy code generator.

How Claude Code Differs from Web Chat Interfaces

If you have used ChatGPT, Gemini, or Claude on the web, you are used to a copy-paste workflow: you paste code in, get a response, copy code out, paste it into your editor, try it, and go back to the chat if it does not work.

Claude Code eliminates that round-trip. The differences:

  • It reads your files directly. No need to paste your code in or describe your project structure. It explores your codebase itself.
  • It edits files in place. No copying from a chat window. Changes go directly into your files.
  • It runs commands in your environment. When it runs your tests, it is running your actual test suite with your actual dependencies, not simulating or guessing.
  • It has persistent context within a session. It remembers what it read, what it tried, and what you discussed. You do not need to re-explain context after every exchange.
  • It operates at the project level, not the snippet level. It can make coordinated changes across multiple files in a single task.

The tradeoff is that it requires terminal comfort. If you are not used to working in a terminal, there is a small learning curve. But if you already live in your terminal, Claude Code fits naturally into your existing workflow.

Quick Install and First Run

Prerequisites: Node.js 18 or later.

Install:

npm install -g @anthropic-ai/claude-code

Set your API key (if you have not already):

export ANTHROPIC_API_KEY=your-key-here

You can add this to your shell profile (~/.bashrc, ~/.zshrc, etc.) so it persists across sessions.

First run:

cd /path/to/your/project
claude

Claude Code starts an interactive session. You see a prompt where you can type your first message. Try something simple to verify it works:

> Describe the structure of this project. What are the main components?

Watch what happens. The agent will read files — your package.json, your directory structure, your main source files — and synthesize a description. Pay attention to the tool calls it makes. This is the agentic loop in action.

Non-interactive mode (for scripting or one-shot tasks):

claude -p "Explain what the main function in src/index.ts does"

The -p flag runs a single prompt and exits, which is useful for quick questions or automation.

The Permission Model

Claude Code gives you control over what the agent can do without asking.

By default, Claude Code asks for your approval before:

  • Writing or editing files
  • Running shell commands (with some exceptions for safe, read-only commands)

This means you see every action before it happens and can approve or reject it. For your first sessions, this is exactly what you want — you get to observe the agent's decision-making and build trust incrementally.

As you get comfortable, you can grant broader permissions:

  • Allow specific tools to run without approval (e.g., allow all file reads, or allow specific commands like npm test).
  • Accept all edits during a session when you trust the direction and want to move faster.

The permission model exists because Claude Code runs in your real environment. It can do anything you can do in your terminal. The guardrails are there so you stay in control, especially early on when you are still learning how the agent behaves.

A sensible starting posture: leave defaults in place for your first few sessions. Once you understand the agent's behavior on your specific codebase, selectively loosen permissions for actions you trust. You can always tighten them again.

CLAUDE.md files. Claude Code reads CLAUDE.md files in your project root and subdirectories. These files contain project-specific instructions — coding conventions, preferred patterns, things to avoid. If your project has one, Claude Code follows those instructions automatically. If it does not, creating one is a high-leverage early investment. You will learn more about this in later modules.


title: "What Is Agentic Development? — Codex CLI" tested_with: codex-cli: "0.2.x" last_updated: 2026-03-21 status: proven difficulty: beginner prerequisites: []

What Is Agentic Development? — Codex CLI

What Codex CLI Is

Codex CLI is OpenAI's terminal-based agentic coding tool. Like Claude Code, it runs in your terminal, inside your project directory, and works directly with your codebase. You describe a task in natural language, and Codex reads files, writes code, runs commands, and iterates toward a solution.

Codex CLI is open source and designed around a principle of safe-by-default execution. Its defining characteristic is a sandboxed execution model — commands run in an isolated environment that limits what the agent can do to your system unless you explicitly opt out. This makes it a good fit if you want agentic capabilities with tighter guardrails out of the box.

The Architecture

Your Terminal
    |
    v
Codex CLI (local process)
    |
    v
OpenAI models (OpenAI API)
    |
    v
Sandboxed Execution: File Read | File Write | Command Execution

When you type a message, Codex CLI sends it — along with context about your project — to an OpenAI model (typically GPT-4.1 or o4-mini). The model decides what to do: read files, edit code, or run commands. Codex CLI executes those actions in a sandboxed environment on your machine and sends the results back to the model. The loop continues until the task is complete.

The sandbox is the architectural distinction. By default, Codex CLI uses platform-level isolation (network-disabled containers on Linux, or seatbelt sandboxing on macOS) to ensure that commands the agent runs cannot make network requests or modify files outside your project directory. This is a meaningful safety property, especially when running in more autonomous modes.

Key Capabilities

File reading. Codex CLI reads files from your project to understand context. It examines source code, configuration files, and project manifests to orient itself before making changes.

File editing. Codex CLI creates and modifies files in your project. Edits are presented as patches that you can review. In more autonomous modes, edits are applied directly.

Command execution. Codex CLI runs shell commands — tests, builds, linters, and other tools. By default, these commands run inside the sandbox, which means they have no network access and cannot modify files outside the working directory. This is a deliberate safety decision: even if the model decides to run something unexpected, the blast radius is contained.

Sub-agents. For multi-part tasks, Codex CLI can use sub-agents to handle specific pieces of work, similar to how Claude Code uses them. The main agent coordinates while sub-agents focus on individual steps.

The Sandbox Model

The sandbox is central to how Codex CLI works, so it is worth understanding clearly.

What the sandbox does:

  • Disables network access for all commands the agent runs.
  • Restricts file writes to your project directory (and temporary directories).
  • Uses OS-level isolation, not just permission checks — this is enforced at the kernel level.

Why this matters:

  • If the agent tries to curl something or pip install a package, it will fail inside the sandbox. This is by design.
  • If the agent tries to modify files outside your project, it will fail.
  • You can run in more autonomous modes without worrying that the agent will make unintended changes to your system.

When the sandbox gets in the way:

  • Tasks that require network access (installing dependencies, fetching data, calling APIs) will not work inside the sandbox.
  • You can disable the sandbox for specific sessions if you need network access, but you are then accepting the tradeoff.

The sandbox is the primary reason Codex CLI can offer a full-auto mode with reasonable safety properties. More on that below.

Approval Modes

Codex CLI offers three operating modes that control how much autonomy the agent has. Choosing the right mode is an important decision.

1. Suggest (default) The agent reads files and proposes changes, but does not apply them or run commands without your explicit approval. Every action requires a confirmation step. This is the safest mode and the right starting point for new users.

Use this when: you are learning the tool, working in an unfamiliar codebase, or doing anything where you want to review every step.

2. Auto-edit The agent can read and write files without approval, but command execution still requires confirmation. This is a good middle ground — the agent can move quickly on code changes while you retain control over what commands run.

Use this when: you trust the agent's edits on your codebase and want to move faster, but still want to approve shell commands.

3. Full-auto The agent reads files, writes files, and runs commands without asking. The sandbox is your safety net here — since commands cannot access the network or modify files outside your project, the blast radius of any mistake is contained.

Use this when: the task is well-defined, you have version control (so you can revert), and the sandbox restrictions do not conflict with what the agent needs to do. This mode is where you experience the full speed of agentic development.

The recommendation: start with suggest mode. Move to auto-edit once you are comfortable with the agent's editing behavior. Move to full-auto only for tasks where the sandbox provides sufficient guardrails and you can verify the result after.

How Codex CLI Differs from Claude Code

Both are terminal-based agentic coding tools, but they make different design choices.

Sandbox vs. permission model. Codex CLI defaults to sandboxed execution — commands are isolated at the OS level. Claude Code defaults to a permission-based model — you approve actions individually. The sandbox approach is more restrictive but requires less vigilance. The permission approach is more flexible but requires you to pay attention.

Underlying models. Codex CLI uses OpenAI models (GPT-4.1, o4-mini, and others in the OpenAI family). Claude Code uses Anthropic's Claude models. The models have different strengths: Claude tends to be strong at careful reasoning, following nuanced instructions, and producing well-structured prose. OpenAI models tend to be strong at code generation breadth and multi-language support. In practice, both are capable enough for most agentic coding tasks.

Configuration approach. Codex CLI uses AGENTS.md files for project-specific instructions — the equivalent of Claude Code's CLAUDE.md. The format and purpose are similar: you describe your project's conventions, preferred patterns, and constraints, and the agent follows them.

Network access. Codex CLI blocks network access by default inside the sandbox. Claude Code does not restrict network access (commands run in your normal shell environment). This matters for tasks that require dependency installation, API calls, or fetching resources.

Open source. Codex CLI is open source. You can read the code, understand exactly what it does, and modify it. Claude Code is not open source, though its behavior is well-documented.

Neither tool is universally better. The right choice depends on your workflow, your preferred model, and how much you value sandboxing vs. flexibility. Many developers use both.

Quick Install and First Run

Prerequisites: Node.js 22 or later.

Install:

npm install -g @openai/codex

Set your API key:

export OPENAI_API_KEY=your-key-here

Add this to your shell profile (~/.bashrc, ~/.zshrc, etc.) so it persists.

First run:

cd /path/to/your/project
codex

Codex CLI starts an interactive session. You see a prompt where you can type your first message. Try:

> Explain the structure of this project and what each top-level directory contains.

Observe the agent's behavior. It will read your directory listing, examine key files, and synthesize a summary. Notice which files it chooses to read and in what order — this gives you insight into how the agent navigates.

Specifying approval mode:

codex --approval-mode full-auto

Or for suggest mode (the default):

codex --approval-mode suggest

One-shot mode (non-interactive):

codex -q "What does the main function in src/index.ts do?"

The -q flag (quiet) runs a single prompt and prints the result, useful for quick questions.

AGENTS.md

Codex CLI reads AGENTS.md files from your project root and subdirectories. This is where you put project-specific instructions that the agent should follow: coding standards, architectural patterns, libraries to prefer or avoid, and any context that is not obvious from the code itself.

If you are setting up Codex CLI on an existing project, creating an AGENTS.md with basic guidance is a worthwhile first step. Later modules in this curriculum will cover how to write effective agent instruction files in detail.


title: "Exercises — What Is Agentic Development?" last_updated: 2026-03-21 status: proven difficulty: beginner prerequisites: []

Exercises — What Is Agentic Development?

These exercises are designed to build your intuition for agentic development through direct experience. Do them in order — each one builds on the mental model from the previous.


Exercise 1: Install and First Task

Objective

Install an agentic coding tool and observe how it operates on a real project. The goal is not to accomplish something specific — it is to watch the agent work so you understand the agentic loop: plan, act, observe, adjust.

Steps

  1. Choose your tool. Install either Claude Code or Codex CLI (or both, if you want to compare). Follow the installation instructions in the respective tool guide in this module.

  2. Pick a project. Navigate to any codebase you have on your machine. It does not need to be large or complex — a side project, a tutorial you followed, or even a small open source repo you have cloned. The only requirement is that it has more than a handful of files, so the agent has something to explore.

  3. Ask the agent to explain the project. Start the tool and type something like:

    Describe the structure of this project. What are the main components,
    what language and frameworks does it use, and how are things organized?
    
  4. Watch carefully. As the agent works, pay attention to:

    • Which files does it read first? (Look at the tool calls or file access notifications.)
    • Does it read the package.json, Cargo.toml, pyproject.toml, or equivalent first? Or does it start with source files?
    • How many files does it read before it starts answering?
    • Does it search for patterns, or does it navigate the file tree?
    • Does the answer match your understanding of the project?
  5. Write down your observations. Specifically note:

    • The total number of files the agent read.
    • The tools it used (file read, search, bash commands).
    • Anything it got wrong or misunderstood.
    • Anything it noticed that you had forgotten about or did not know.

Expected Outcome

You should have the tool installed and running, and you should have a concrete understanding of how the agent explores a codebase. You will likely notice that the agent reads a surprisingly large number of files and that its exploration strategy is systematic — it starts with high-signal files (manifests, entry points, configuration) and drills into specifics.

Hints

  • If you do not have a project handy, clone a small open source repo. Something like a Express.js starter, a small CLI tool, or a Flask app works well.
  • If the agent's response seems shallow, follow up: "What does the services/ directory contain? Trace through how a request flows from the API layer to the database." This pushes the agent to read deeper.
  • If you are using Codex CLI in suggest mode, you will need to approve file reads. This is actually useful for this exercise because you see every file access explicitly.

Exercise 2: Spot the Difference

Objective

Compare manual work against agentic work on the same type of task. The goal is to calibrate your sense of where agents add value and where they introduce overhead. Not every task benefits from agentic tools, and this exercise helps you find the boundary.

Steps

  1. Pick a small, concrete task. Choose something you can complete manually in 5-15 minutes. Good candidates:

    • Add a .gitignore file appropriate for your project's language and framework.
    • Rename a function or variable across the codebase.
    • Add a new dependency and import it in the relevant file.
    • Add a missing error handler to an existing endpoint.
    • Write a simple unit test for an existing function.
  2. Do it manually first. Complete the task by hand, the way you normally would. Time yourself (roughly — do not stress precision). Note the steps: which files you opened, what you searched for, what you had to look up.

  3. Reset your changes. Use git checkout . or git stash to undo your manual work so the codebase is back to its original state.

  4. Ask the agent to do a similar task. Describe the task in natural language. For example:

    Add a .gitignore file for this project. It's a Node.js project using
    TypeScript. Include common entries for node_modules, build output,
    editor files, and OS-specific files.
    

    Time the agent's work as well. Watch its process.

  5. Compare. Evaluate along these dimensions:

    • Speed: Which approach was faster end-to-end? (Include the time to review the agent's work.)
    • Accuracy: Did the agent's result match or exceed your manual result? Were there mistakes?
    • Completeness: Did the agent include things you forgot or did not think of?
    • Effort: How much mental energy did each approach require from you?
    • Surprise factor: Did anything about the agent's approach surprise you — either positively or negatively?
  6. Write a brief comparison (a few sentences for each dimension). Be honest. If the manual approach was better for this task, say so and note why.

Expected Outcome

For simple, well-defined tasks (like adding a .gitignore), the agent will likely be comparable in speed and may be more thorough (including entries you would not have thought of). For tasks requiring project-specific judgment (like renaming a function with semantic implications), you may find the manual approach more precise. The key insight is that the agent's value increases with the mechanical complexity of the task.

Hints

  • Choose a task you actually know how to do. This exercise is about comparison, not about learning something new with the agent's help.
  • If the agent makes a mistake, do not fix it manually. Ask the agent to fix it. Observe how it handles corrections — this is part of the agentic workflow.
  • If you find the manual approach was clearly faster, that is a valid and useful data point. Some tasks are better done manually. The skill is knowing which ones.

Exercise 3: Agentic vs. Non-Agentic

Objective

Build the judgment muscle for choosing the right mode of AI assistance. Given a set of real tasks from your own work, categorize each one by the mode that fits best: completion (autocomplete/copilot), chat (ask a question), or agentic (autonomous multi-step). This is the most important skill in this module — knowing when to reach for which tool.

Steps

  1. List 5 tasks from your recent work. Look at your last week or two of development. Pull out 5 concrete tasks you actually did (or need to do). Write each one as a short description. Examples:

    • "Fixed a typo in an error message."
    • "Added pagination to the /users endpoint."
    • "Investigated why the CI build was failing intermittently."
    • "Refactored the authentication middleware to use the new token format."
    • "Wrote API documentation for the billing endpoints."
  2. For each task, categorize it. Assign it to one of:

    • Completion — best handled by autocomplete/copilot suggestions as you type.
    • Chat — best handled by asking a question and getting an answer or snippet.
    • Agentic — best handled by an autonomous agent that reads, edits, runs, and iterates.
  3. Explain your reasoning. For each task, write 2-3 sentences explaining why you chose that category. Consider:

    • How many files are involved?
    • Is the task well-defined or ambiguous?
    • Does it require running commands to verify?
    • Is there a lot of mechanical work (boilerplate, repetition, cross-file changes)?
    • Does it require deep project context or is it self-contained?
  4. Identify the borderline cases. At least one of your 5 tasks should be a close call — a task where two modes seem roughly equal. Explain what would tip it one way or the other.

  5. Share and discuss. If you are going through this curriculum with others, compare your categorizations. Disagreements are productive — they reveal different mental models about where agents add value.

Expected Outcome

You should have a concrete, personally relevant framework for choosing the right mode of AI assistance. The general pattern that emerges:

  • Completion fits single-line or few-line tasks where you already know what you want to write: finishing a function signature, writing a quick conditional, filling in boilerplate.
  • Chat fits questions and isolated problems: "How do I parse a date in this format?", "What's wrong with this regex?", "Explain this error message."
  • Agentic fits multi-step tasks that touch multiple files and benefit from running commands: adding features, refactoring, writing tests, investigating bugs, making cross-cutting changes.

The borderline cases are the most instructive. A task like "add a simple helper function" might be completion-level if it is one function in one file, chat-level if you are unsure about the approach, or agentic if it requires updating imports and tests across multiple files.

Hints

  • Be specific about the tasks. "Build a feature" is too vague. "Add email validation to the signup form, including error messages and a unit test" is concrete enough to categorize meaningfully.
  • If you find most of your tasks fall into one category, deliberately think of tasks from a different category. The goal is to populate all three buckets so you can see the boundaries.
  • There is no single right answer. Reasonable people will categorize the same task differently based on their familiarity with the codebase, their typing speed, and their comfort with the tools. The value is in the reasoning, not the label.

title: "Your First Hour" last_updated: 2026-03-21 status: proven difficulty: beginner prerequisites: [00-what-is-agentic-dev]

Your First Hour

Going from install to productive in 60 minutes.

The First-Hour Framework

Your first hour with an agentic coding tool follows a natural arc. Resist the urge to skip ahead. Each phase builds on the previous one, and rushing through them creates bad habits that compound over weeks and months.

The framework has five phases:

  1. Install (~5 minutes) — Get the tool running. This is the easy part.
  2. Orient (~10 minutes) — Let the agent explore your codebase. Watch what it does.
  3. Configure (~5 minutes) — Write a minimal configuration file. Three to four lines is enough.
  4. Build (~30 minutes) — Complete a real but low-risk task with the agent's help.
  5. Reflect (~10 minutes) — Review what happened. Adjust your mental model.

This is not a rigid schedule. Some phases will take longer, some shorter. The point is that all five phases happen, in order, during your first session. Skip any one of them and you will form incomplete mental models that slow you down later.

Why the First Hour Matters

The first hour is not about productivity. It is about calibration.

During this hour, you are building an internal sense of what the agent is good at, where it struggles, how fast it works, and what it costs. These impressions will shape every decision you make with the tool going forward — which tasks you delegate, how much context you provide, how carefully you review output.

Get this calibration wrong and you will either over-trust the agent (leading to subtle bugs you don't catch) or under-trust it (leading to micromanagement that defeats the purpose of using it). Both failure modes are common, and both originate in a poorly structured first session.

Habits form fast. If your first interaction is typing vague prompts and accepting output without review, that becomes your default. If your first interaction is thoughtful, structured, and review-oriented, that becomes your default instead. Choose deliberately.

The Critical Mistake: Toy Projects

Most people start with a toy project. They create a fresh directory, ask the agent to build a to-do app or a calculator, and marvel at the output. This feels productive. It is not.

Toy projects tell you almost nothing about how the agent will perform on your actual work. They are too simple, too well-represented in training data, and too far removed from the messy reality of production codebases. An agent that writes a perfect calculator from scratch may flounder when asked to add a feature to your 50,000-line monorepo with custom conventions, unusual architecture, and undocumented edge cases.

Worse, toy projects give you a distorted sense of the agent's capabilities. You walk away thinking "this thing is amazing" or "this thing is useless" based on evidence that does not transfer to your real work.

Use your real project from minute one. Pick a codebase you already understand — something you work on daily or weekly. The agent's output will be more meaningful because you can instantly evaluate whether it is correct. You will spot hallucinations, incorrect assumptions, and style violations because you know what right looks like. This is exactly the kind of feedback loop that builds accurate calibration.

The Orient Phase

Before asking the agent to change anything, ask it to read.

Give the agent a broad exploratory prompt: "Read this project and summarize the architecture" or "Explain how this codebase is organized." Then watch what happens. Pay attention to which files it reads, which ones it skips, and whether its summary matches your understanding.

This phase serves two purposes. First, it gives the agent context. Agentic tools work dramatically better when they have oriented themselves in the codebase before attempting modifications. Second, it gives you context — about the agent. You learn how it navigates code, what it pays attention to, and where its understanding diverges from yours.

If the agent's summary is wildly off, that tells you something important: either the codebase is unusual enough to need more guidance, or you need to provide better orientation hints. Both are valuable signals before you ask it to write code.

Do not skip this phase. The ten minutes you spend here save you thirty minutes of confused debugging later.

The Configure Phase

Agentic coding tools look for project-level configuration files — CLAUDE.md for Claude Code, AGENTS.md for Codex CLI. These files give the agent persistent context about your project: what language you use, what conventions you follow, how to build and test.

Many people skip this step entirely, planning to "set it up properly later." Others go overboard, writing pages of detailed instructions before they have any experience with what the agent actually needs to know.

Both approaches are wrong. Start with a minimal configuration — three to four lines that cover the absolute basics:

  • What the project is (one sentence)
  • What language and framework it uses
  • One or two critical conventions (e.g., "use tabs not spaces" or "all functions must have docstrings")

That is it. You will expand this file over time as you discover what the agent gets wrong without guidance. But you need something in place from the start, because even a minimal configuration file measurably improves output quality. The agent stops guessing about basics and focuses its reasoning on the actual problem.

Think of it as a two-line README for your AI collaborator. You would not onboard a new teammate without any context. Do not onboard your agent without any either.

The Build Phase

Now you build something. But choose your task carefully.

The ideal first task has these properties:

  • Real: It comes from your actual backlog, not a contrived exercise. A real bug to fix, a real test to add, a real function to document.
  • Low-risk: If the agent gets it wrong, nothing catastrophic happens. Adding a test is low-risk. Refactoring your authentication system is not.
  • Verifiable: You can check the result quickly. A test either passes or it doesn't. A bug fix either resolves the issue or it doesn't.
  • Small: It should take the agent 5-15 minutes, not 45. You want time to iterate and reflect.

Good first tasks include:

  • Add a unit test for an existing function
  • Fix a small, well-understood bug
  • Add type hints or docstrings to a module
  • Implement a simple utility function that is on your to-do list

Bad first tasks include:

  • Build an entire new feature
  • Refactor a critical system
  • Migrate to a new framework
  • Anything involving secrets, credentials, or production data

During this phase, pay attention to the agent's process, not just its output. How does it break the problem down? What files does it read before making changes? Does it ask clarifying questions or charge ahead? Does it test its own work? These observations are more valuable than the code it produces.

The Reflect Phase

When the task is done — or when you have decided to stop — take ten minutes to reflect. This is where calibration actually happens.

Review every change the agent made. Use git diff to see exactly what was modified. Ask yourself:

  • Correctness: Is the code right? Not just "does it run" but "would I approve this in a code review?"
  • Style: Does it match your project's conventions? Did the agent pick up on patterns in your existing code?
  • Scope: Did the agent do what you asked, or did it do more? Scope creep from agents is common and sometimes subtle.
  • Process: How much back-and-forth did it take? Did you have to correct the agent, and if so, about what?

Write down your observations. Not a formal document — a few bullet points in a scratch file is fine. What worked well? What surprised you? What would you do differently next time? What should you add to your configuration file?

This reflection is not optional busywork. It is the mechanism by which your first hour converts from an experience into a skill. Without it, you will repeat the same mistakes in your second hour.

Cost Awareness

Your first hour will cost roughly $1-5 in API tokens. This is normal.

The orient phase is the most expensive part because the agent reads many files. The build phase varies depending on the complexity of your task and how many iterations it takes. Configuration and reflection cost almost nothing.

Do not optimize for cost during your first hour. The goal is calibration, not efficiency. Once you have a clear sense of what works, you will naturally become more cost-effective — providing better prompts, catching issues earlier, choosing tasks that play to the agent's strengths. Premature cost optimization leads to under-utilizing the tool, which is far more expensive than a few extra dollars in tokens.

That said, develop an awareness of cost from the start. Most tools show token usage or cost estimates somewhere in their interface. Glance at these numbers periodically. You are building an intuition for what "expensive" and "cheap" interactions look like, and that intuition will serve you well as you scale up your usage.

Key Takeaways

  • Follow the five-phase framework: install, orient, configure, build, reflect.
  • Use your real project from the very first session. Toy projects teach you nothing transferable.
  • Let the agent read before it writes. Orientation dramatically improves output quality.
  • Start with a minimal configuration file. Three lines now beats thirty lines next month.
  • Choose a real, low-risk, verifiable, small task for your first build.
  • Review every change with git diff. This is where you calibrate trust.
  • Your first hour costs $1-5 in tokens. That is the price of calibration, and it is worth it.
  • Write down what you learned. Reflection converts experience into skill.

title: "Your First Hour — Claude Code" tested_with: claude-code: "1.0.x" last_updated: 2026-03-21 status: proven difficulty: beginner prerequisites: [00-what-is-agentic-dev]

Your First Hour with Claude Code

A step-by-step walkthrough from installation to your first completed task. Every command below is copy-paste ready. Notes labeled Adapt this tell you what to change for your specific situation.


Step 1: Installation (~2 minutes)

Claude Code is distributed as an npm package. You need Node.js 18 or later.

npm install -g @anthropic-ai/claude-code

Verify the installation:

claude --version

You should see a version number starting with 1.0. If you get a "command not found" error, make sure your npm global bin directory is in your PATH.

Adapt this: If you use a Node version manager like nvm or fnm, make sure you are on Node 18+ before installing. Run node --version to check.


Step 2: API Key Setup (~2 minutes)

Claude Code needs an Anthropic API key. If you do not have one yet, create one at console.anthropic.com.

Set the key as an environment variable:

export ANTHROPIC_API_KEY="sk-ant-your-key-here"

To make this permanent, add the line to your shell profile:

echo 'export ANTHROPIC_API_KEY="sk-ant-your-key-here"' >> ~/.bashrc
source ~/.bashrc

Adapt this: Replace ~/.bashrc with ~/.zshrc if you use zsh, or ~/.config/fish/config.fish for fish shell (using set -gx syntax instead of export).


Step 3: Navigate to Your Project and Launch (~1 minute)

Go to a project you actively work on. Not a toy project — a real codebase you understand well.

cd ~/projects/your-project
claude

Claude Code starts an interactive session in your terminal. You will see a prompt where you can type natural language instructions.

Adapt this: Replace ~/projects/your-project with the actual path to your project. Choose something you know well enough to evaluate the agent's output.


Step 4: The Guided Tour — Orient the Agent (~10 minutes)

Your first command should ask Claude to explore, not to build.

Read this project and summarize the architecture. What are the key directories,
the primary language, and how the code is organized?

Watch what happens. Claude will read files — package.json, README.md, directory listings, key source files. It will produce a summary of what it finds.

Now evaluate that summary against your own knowledge. Is it accurate? Did it miss anything important? Did it identify the right entry points and core modules?

Follow up with a more targeted question:

What are the 3 most important files in this project and why?

Compare its answer to what you would say. Disagreements here are informative — they tell you what context the agent is missing, which will inform your configuration file in the next step.

Adapt this: If your project is very large (100k+ lines), narrow the scope: "Read the src/api directory and summarize how the API layer is organized."


Step 5: Create a Minimal CLAUDE.md (~5 minutes)

Create a CLAUDE.md file in the root of your project. This gives Claude persistent context across sessions. Start small — you will expand it later as you learn what the agent needs.

Create a CLAUDE.md file in the project root with the following content, but adapt
the details to match what you just learned about this project.

Here is a starter template. Write this yourself or ask Claude to draft it after the orient phase:

# CLAUDE.md

This is a [brief description, e.g., "Python REST API for managing inventory"].
Built with [language/framework, e.g., "Python 3.12 and FastAPI"].

## Key commands
- Run tests: [your test command, e.g., `pytest`]
- Start dev server: [your dev command, e.g., `uvicorn main:app --reload`]

## Conventions
- [One critical convention, e.g., "All functions must have type hints and docstrings"]

That is five lines of actual content. It is enough. Do not spend twenty minutes writing an exhaustive document. You will learn what else to add through experience.

Adapt this: Fill in your project's actual language, framework, test command, and one convention you care about most. If you are unsure, ask Claude: "Based on what you read, draft a 5-line CLAUDE.md for this project."


Step 6: Your First Real Task (~15 minutes)

Pick a small, real task. Here are two safe starting points:

Option A — Add a test:

Write a unit test for the [function_name] function in [file_path].
Cover the happy path and one edge case. Follow the testing patterns
already used in this project.

Option B — Fix a small bug:

There's a bug in [file_path]: [brief description of the bug].
Fix it and explain what was wrong.

Watch the agent work. It will read relevant files, reason about the problem, propose changes, and ask for your permission before writing to disk.

When Claude proposes changes, read them before approving. This is not a speed exercise — it is a calibration exercise. You are learning what "good agent output" looks like for your specific codebase.

Adapt this: Replace the bracketed placeholders with a real function, file, or bug from your project. If nothing comes to mind, try: "Find a function in this project that has no tests and write one."


Step 7: Understanding Permissions (~2 minutes)

Claude Code asks for permission before performing actions that modify your system. You will see prompts like:

  • File write permission: Claude wants to create or modify a file. You see the proposed changes and can approve or reject.
  • Command execution permission: Claude wants to run a shell command (e.g., npm test). You see the command and can approve or reject.

You can also configure permission behavior:

/permissions

This shows your current permission settings. For your first session, keep the defaults — approve everything manually. This forces you to read every change, which is exactly what you want during calibration.

As you gain confidence, you can allow certain operations automatically. But not yet.


Step 8: Useful Built-in Commands (~3 minutes)

Claude Code has several built-in commands you should know about. Type any of these at the prompt:

/help

Shows all available commands and keyboard shortcuts.

/status

Shows the current session state: how many tokens have been used, the current model, and other session metadata.

/cost

Shows token usage and estimated cost for the current session. Check this periodically to build cost awareness.

/compact

Compresses the conversation history to free up context window space. Useful during long sessions when you notice the agent starting to "forget" earlier context.

/clear

Clears the conversation history entirely. Use this when you want to start a fresh task within the same session.

Take a minute to run /help and scan the available commands. You do not need to memorize them now, but knowing they exist will save you time later.


Step 9: Ending and Starting Sessions (~2 minutes)

To end your session:

  • Press Ctrl+C to cancel the current operation (if one is running)
  • Press Ctrl+D or type /exit to leave the session entirely

Your CLAUDE.md file persists on disk, so the next session will pick it up automatically. Conversation history is also preserved between sessions by default.

To start a new session later:

cd ~/projects/your-project
claude

To start a session with a specific task (non-interactive mode):

claude "Run the test suite and report any failures"

This runs the task and exits automatically when done. Useful for quick checks.

Adapt this: Non-interactive mode is especially useful for tasks you find yourself repeating. Think about what those might be for your project.


Step 10: Reviewing What Changed (~10 minutes)

This is the most important step. Before you close your terminal, review everything the agent changed.

git diff

If Claude created new files that are not yet tracked:

git status
git diff --cached

Read every line of the diff. Ask yourself:

  • Is this code correct?
  • Does it match the project's style?
  • Did Claude change only what I asked it to change, or did it modify other things too?
  • Would I approve this in a code review?

If you are satisfied with the changes, commit them:

git add -p
git commit -m "Add test for [function_name] via Claude Code"

Using git add -p lets you stage changes interactively, hunk by hunk. This is a good habit: it forces you to review every change one more time before committing.

If you are not satisfied, you can discard the changes:

git checkout -- .

Or keep what you like and discard what you don't — this is why version control exists.

Adapt this: If you use a different version control system, adjust the commands accordingly. The principle is the same: review every change before accepting it.


Quick Reference

PhaseTimeKey Command
Install2 minnpm install -g @anthropic-ai/claude-code
API Key2 minexport ANTHROPIC_API_KEY="sk-ant-..."
Launch1 mincd ~/your-project && claude
Orient10 min"Read this project and summarize the architecture"
Configure5 minCreate a 5-line CLAUDE.md
Build15 min"Write a unit test for [function] in [file]"
Review10 mingit diff

Total estimated cost: $1-5 in API tokens for the full hour.


title: "Your First Hour — Codex CLI" tested_with: codex-cli: "0.2.x" last_updated: 2026-03-21 status: proven difficulty: beginner prerequisites: [00-what-is-agentic-dev]

Your First Hour with Codex CLI

A step-by-step walkthrough from installation to your first completed task. Every command below is copy-paste ready. Notes labeled Adapt this tell you what to change for your specific situation.


Step 1: Installation (~2 minutes)

Codex CLI is distributed as an npm package. You need Node.js 18 or later.

npm install -g @openai/codex

Verify the installation:

codex --version

You should see a version number starting with 0.2. If you get a "command not found" error, check that your npm global bin directory is on your PATH.

Adapt this: If you use nvm or fnm, ensure you are on Node 18+ before installing. Run node --version to check.


Step 2: API Key Setup (~2 minutes)

Codex CLI uses the OpenAI API. You need an OpenAI API key. Create one at platform.openai.com if you do not have one.

Set the key as an environment variable:

export OPENAI_API_KEY="sk-your-key-here"

To make this permanent, add the line to your shell profile:

echo 'export OPENAI_API_KEY="sk-your-key-here"' >> ~/.bashrc
source ~/.bashrc

Adapt this: Replace ~/.bashrc with ~/.zshrc for zsh, or use set -gx syntax for fish shell.


Step 3: Navigate to Your Project and Launch (~1 minute)

Go to a project you actively work on. A real codebase, not a tutorial project.

cd ~/projects/your-project
codex

Codex CLI starts an interactive session. You will see a prompt where you can type natural language instructions.

Adapt this: Replace ~/projects/your-project with the actual path to your codebase. Pick something you know well enough to judge the agent's output.


Step 4: The Guided Tour — Orient the Agent (~10 minutes)

Start by asking Codex to explore, not to build.

Explain this project's structure. What are the key directories, the primary
language, and how is the code organized?

Codex will examine your file tree, read key files, and produce a summary. Evaluate that summary against your own knowledge. Where is it accurate? Where does it miss the mark?

Follow up:

What are the 3 most critical files in this project? Explain why each matters.

The point is not to get a perfect answer. The point is to see how the agent reasons about unfamiliar code and to identify what context it is missing. Those gaps will inform your configuration file next.

Adapt this: For very large projects, narrow the scope: "Explain the structure of the src/services directory and how the services interact."


Step 5: Create a Minimal AGENTS.md (~5 minutes)

Create an AGENTS.md file in the root of your project. Codex CLI reads this file for persistent project context. Start with the bare minimum:

# AGENTS.md

## Project overview
This is a [brief description, e.g., "TypeScript CLI tool for database migrations"].
Built with [language/framework, e.g., "TypeScript 5.x and Node.js"].

## Commands
- Install: [e.g., `npm install`]
- Test: [e.g., `npm test`]
- Build: [e.g., `npm run build`]

## Conventions
- [One key convention, e.g., "Use named exports, not default exports"]

Keep it short. Five to seven lines of real content. You will add more as you discover what the agent gets wrong without explicit guidance.

Adapt this: Fill in your project's actual details. If you are unsure what to include, ask Codex: "Based on what you've seen, what should an AGENTS.md for this project contain?" Use its answer as a starting draft and edit it down.


Step 6: Understanding Approval Modes (~5 minutes)

Codex CLI has three approval modes that control how much autonomy the agent has. This is the most important concept to understand before you start building.

Suggest Mode (the default)

codex --approval-mode suggest

The agent proposes changes but writes nothing to disk without your explicit approval. Every file edit, every command execution requires a "yes" from you. This is the safest mode and the right starting point.

Auto-Edit Mode

codex --approval-mode auto-edit

The agent can read and write files without asking, but still requires approval before running shell commands. Use this when you trust the agent's code changes but want to control what gets executed.

Full-Auto Mode

codex --approval-mode full-auto

The agent reads, writes, and executes commands without asking. Everything runs inside a sandboxed environment (more on this in Step 9). Use this only for well-understood, low-risk tasks once you have calibrated your trust.

For your first hour, stay in suggest mode. You want to see and approve every change. This is how you build an accurate mental model of what the agent does.


Step 7: Your First Real Task in Suggest Mode (~15 minutes)

Pick a small, real task from your project. Start in suggest mode so you see everything before it happens.

Option A — Add a test:

Write a unit test for the [function_name] function in [file_path].
Cover the normal case and one edge case. Match the testing style
already used in this project.

Option B — Fix a small bug:

In [file_path], there's a bug where [brief description].
Fix it and explain the root cause.

Option C — Add documentation:

Add JSDoc comments to all exported functions in [file_path].
Follow the documentation style used elsewhere in this project.

When Codex proposes a change, you will see a diff. Read it carefully before approving. This is calibration, not a race.

If the proposed change is wrong, reject it and provide more context:

That's not quite right. The function should [clarification]. Try again.

Adapt this: Replace the bracketed placeholders with real files and functions from your project. If nothing comes to mind, try: "Find a function in this project that lacks tests and write one."


Step 8: Switching to Auto-Edit for a Known-Safe Task (~5 minutes)

Once you have completed a task in suggest mode and feel comfortable with the agent's judgment, try auto-edit mode for a simple, safe task.

Start a new task with auto-edit:

codex --approval-mode auto-edit

Then give it something low-risk where file changes are fine but you want to control execution:

Add type annotations to all functions in [file_path] that are currently untyped.
Do not change any logic, only add types.

Notice the difference in flow. Codex will modify files without asking but will still prompt you before running any commands like tests or linters. This is a good middle ground for tasks where the changes are straightforward and you mainly care about what gets executed.

Adapt this: Choose a file where type annotations or docstrings are missing. This is the safest class of auto-edit task because it is additive — nothing existing is modified.


Step 9: The Sandbox — What Runs Where and Why (~3 minutes)

Codex CLI runs shell commands inside a sandboxed environment. This is a critical safety feature, especially in full-auto mode.

The sandbox restricts:

  • Network access: Commands cannot make outbound network requests by default. This prevents accidental data exfiltration or unintended API calls.
  • File system access: Commands can only access files within your project directory. They cannot read or modify files outside the project root.
  • Process isolation: Commands run in an isolated context that limits what system resources they can touch.

In suggest and auto-edit modes, you approve each command before it runs, so the sandbox is a secondary safety net. In full-auto mode, the sandbox is your primary safety net.

To see the current sandbox configuration:

What sandbox restrictions are currently in place?

For your first hour, the defaults are fine. The key takeaway is that Codex is designed with defense in depth: approval modes control what the agent can do, and the sandbox controls what happens when commands actually execute.


Step 10: Reviewing Changes and Ending the Session (~10 minutes)

Before you close anything, review every change the agent made.

git diff

For new untracked files:

git status

Read every line of the diff. Ask yourself:

  • Is this code correct? Would it pass a code review?
  • Does it match the project's existing style and conventions?
  • Did the agent stay within the scope of what I asked?
  • Are there any changes I did not expect or did not request?

If the changes look good:

git add -p
git commit -m "Add tests for [function_name] via Codex CLI"

Using git add -p stages changes interactively, giving you one final review before committing.

If the changes are not right, discard them:

git checkout -- .

To end the Codex CLI session, press Ctrl+C or Ctrl+D.

Your AGENTS.md file is saved on disk and will be read automatically the next time you start a session in this directory.


Quick Reference

PhaseTimeKey Command
Install2 minnpm install -g @openai/codex
API Key2 minexport OPENAI_API_KEY="sk-..."
Launch1 mincd ~/your-project && codex
Orient10 min"Explain this project's structure"
Configure5 minCreate a 5-line AGENTS.md
Build15 min"Write a unit test for [function] in [file]"
Review10 mingit diff

Approval mode cheat sheet:

ModeFile editsShell commandsBest for
suggestAsk firstAsk firstLearning, high-risk tasks
auto-editAutomaticAsk firstRoutine edits, trusted changes
full-autoAutomaticAutomaticWell-understood, sandboxed tasks

Total estimated cost: $1-5 in API tokens for the full hour.


title: "Exercises — Your First Hour" last_updated: 2026-03-21 status: proven difficulty: beginner prerequisites: [00-what-is-agentic-dev]

Exercises: Your First Hour

These exercises use your real project, not a toy example. Each one builds on the previous, so complete them in order. Budget about 45 minutes total.


Exercise 1: The Guided Tour

Objective: Evaluate how well the agent understands your codebase without any configuration, and calibrate your expectations for its reasoning about unfamiliar code.

Steps

  1. Open a terminal and navigate to a project you actively work on.

  2. Start a session with your chosen tool (claude or codex).

  3. Enter this prompt:

    Read this project and give me a high-level summary. What is it, how is it
    organized, and what are the key architectural decisions?
    
  4. Read the agent's response carefully. Note where it is accurate and where it is off.

  5. Now ask a follow-up:

    What are the 3 most important files in this project? For each one,
    explain why it matters.
    
  6. Write down the agent's answer. Then write down your own answer — the 3 files you consider most important.

  7. Compare the two lists. Where do they agree? Where do they differ?

Expected Outcome

The agent should produce a broadly accurate summary but may miss domain-specific nuances, undocumented conventions, or non-obvious architectural decisions. Its "top 3 files" list will likely overlap with yours on 1-2 files but diverge on at least one.

The divergences are the most valuable part. Each one represents context that the agent is missing — context you may want to put in your configuration file later.

Hints

  • If your project is very large, scope the question to a specific directory or module. An agent trying to summarize a monorepo with 500 files will produce a shallow summary. Narrowing the scope gets you deeper, more evaluable output.
  • If the agent's summary is wildly inaccurate, that itself is a useful signal. Ask it: "What files did you look at to reach that conclusion?" This reveals its reasoning path and helps you understand the failure mode.
  • Do not correct the agent during this exercise. You are assessing its unguided performance. Corrections come later.

Exercise 2: Your First Real Task

Objective: Complete a genuine task from your backlog using the agent, measure the time it takes, and rigorously review the output.

Steps

  1. Before starting the agent, pick a small task from your actual work. Good candidates:

    • A unit test that should exist but doesn't
    • A small bug you have been meaning to fix
    • A function that needs documentation or type annotations
    • A simple utility or helper that is on your to-do list
  2. Write the task down in one sentence before you start the agent. This is your "spec."

  3. Start a session and give the agent your task. For example:

    Write a unit test for the `parseConfig` function in src/config/parser.ts.
    It should test: valid input returns expected output, empty input throws
    an error, and malformed input is handled gracefully.
    
  4. Start a timer when you press Enter.

  5. Work with the agent until the task is done. If the agent asks questions, answer them. If it proposes incorrect changes, tell it what is wrong and let it try again.

  6. Stop the timer when you consider the task complete.

  7. Run git diff and review every line of the change.

Expected Outcome

A working, committed change that accomplishes your task. Your review should confirm that the code is correct, matches your project's style, and stays within the scope you defined.

Typical timing for a first task: 10-20 minutes including review. If it takes longer than 30 minutes, the task was likely too complex for a first attempt. Note this and choose something smaller next time.

Hints

  • Be specific in your initial prompt. "Write a test" is too vague. "Write a unit test for function X in file Y covering cases A, B, and C" gives the agent clear constraints.
  • If the agent produces something close but not quite right, resist the urge to fix it yourself. Instead, describe what is wrong and let the agent iterate. Learning to give effective feedback is a core agentic development skill.
  • If the agent gets stuck in a loop — producing the same wrong answer repeatedly — that is a signal to step in. Give it more context, rephrase the problem, or point it to a specific file or pattern it should follow.
  • Record your total time. You will use this as a baseline to measure improvement in later modules.

Exercise 3: The Minimal CLAUDE.md

Objective: Create a minimal project configuration file and measure whether it improves the agent's output quality on a task you have already completed.

Steps

  1. Create a configuration file in your project root. Use CLAUDE.md for Claude Code or AGENTS.md for Codex CLI.

  2. Write exactly 5 lines of content (not counting the heading). Use this template:

    # CLAUDE.md
    
    [Project name] is a [one-sentence description].
    Built with [primary language] and [primary framework/library].
    Run tests with: `[your test command]`
    Run the dev server with: `[your dev command]`
    [One convention, e.g., "Always use descriptive variable names and add docstrings to public functions."]
    
  3. Save the file.

  4. Start a new session (so the agent picks up the configuration file).

  5. Run the exact same task you completed in Exercise 2. Use the same prompt, word for word.

  6. Compare the output to your Exercise 2 result:

    • Did the agent follow your stated conventions?
    • Did it use the correct test command or build command?
    • Did the code style match your project better than before?
    • Was there any other noticeable difference in quality?
  7. Write down your observations in 3-4 bullet points.

Expected Outcome

You should notice at least one concrete improvement. Common improvements include: the agent uses the correct testing framework without being told, it follows your naming conventions more consistently, or it structures code in a way that better matches your project's patterns.

Some tasks will show dramatic improvement. Others will show only marginal difference. Both outcomes are informative. Tasks that improve a lot tell you what context the agent was missing. Tasks that improve little tell you the agent was already inferring that context from your codebase.

Hints

  • Use exactly 5 lines. The temptation is to write more, but the exercise is specifically designed to measure the impact of minimal configuration. You will add more lines in later modules as you identify specific gaps.
  • Make sure you start a new session after creating the file. If you add the file mid-session, the agent may not read it until the next session (behavior varies by tool).
  • If you do not see any improvement, that is still a useful result. It means either your task was simple enough that configuration did not matter, or the agent was already inferring the right context from your code. Try a different task to see if the pattern holds.
  • Keep your observations from this exercise. You will refer back to them in Module 03 when we cover advanced configuration strategies.

After You Finish

You have now completed the core loop of agentic development: orient, configure, build, review. Everything in the rest of this curriculum builds on this foundation.

Before moving on, make sure you can answer these questions:

  1. How accurate was the agent's first impression of your codebase?
  2. How long did your first real task take, and was the output correct?
  3. Did a minimal configuration file make a measurable difference?

If you have clear answers to all three, you are ready for Module 02.


title: "Project Memory" last_updated: 2026-03-21 status: proven difficulty: beginner prerequisites: [01-your-first-hour]

Project Memory

The Core Question: How Do Agents Understand Your Project?

Every time you start a new session with an AI coding agent, you face the same problem: the agent knows nothing about your project. It does not know your directory structure. It does not know that your team uses Zustand instead of Redux, that your API responses always follow a specific envelope format, or that there is a legacy module in src/billing/ that must never be refactored without a product manager's sign-off.

The agent starts cold. Every single session.

This is the fundamental challenge of agentic development: your AI collaborator has powerful general knowledge of programming, but zero specific knowledge of your codebase. Without help, it will guess at your conventions, produce code that works but does not fit, and force you into repetitive corrections.

Project memory solves this. It is the collective set of configuration files — CLAUDE.md, AGENTS.md, and their supporting directories — that give an agent persistent context about your project. Think of it as the onboarding document you would write for a new team member who is brilliant but has never seen your codebase.

Project Memory as Compound Interest

A well-crafted project memory file is the single highest-leverage investment in agentic development. Here is why: every configuration line you write pays off not once, but across every session, every task, and every team member who uses an agent on your project.

Consider the math. Suppose you spend 30 minutes writing a CLAUDE.md that saves you 2 minutes of corrections per session. If you run 10 sessions per day, that is 20 minutes saved daily — and your initial investment pays for itself in under two days. Over a month, you have saved roughly 7 hours. Over a year, you have saved weeks.

Now multiply that across your team. A shared project memory file means every developer gets those savings from day one, without each person independently discovering the same conventions through trial and error.

The compounding goes deeper. As your project memory improves, the agent produces better code on the first attempt. Better first attempts mean you trust the agent with larger tasks. Larger tasks mean greater productivity gains. The virtuous cycle accelerates.

What Goes in Project Memory

Effective project memory answers five questions an agent would ask on its first day:

1. What is this project? A one-to-two sentence overview. Not a marketing pitch — a technical summary. "This is a Next.js 14 e-commerce platform using App Router, Postgres via Drizzle ORM, and Stripe for payments."

2. How is the code organized? Key directories and their purpose. The agent needs a mental map of where things live. You do not need to list every folder — just the ones that matter for navigation.

3. What conventions should I follow? Naming patterns, formatting rules, preferred libraries, architectural patterns. These are the instructions that prevent the agent from writing technically correct code that looks nothing like the rest of your codebase.

4. How do I run things? Build commands, test commands, deploy steps, common development tasks. Agents frequently need to verify their work, and they cannot do that if they do not know how to run your test suite.

5. What should I avoid? Anti-patterns specific to your project. Legacy modules that should not be touched. Libraries that look like good fits but cause problems. Patterns that have been tried and rejected. This section prevents the agent from repeating your team's past mistakes.

The Hierarchy: Repo, Directory, File

Project memory is not a single flat file. It operates as a hierarchy:

  • Repo-level instructions (project root) apply everywhere. This is where your project overview, global conventions, and common tasks live.
  • Directory-level instructions (subdirectories) override or extend repo-level guidance for specific parts of the codebase. Your api/ directory might have different conventions than your frontend/ directory.
  • File-level instructions come from inline comments within your code — these are the most specific and the most rare.

This hierarchy means you can keep your root configuration concise while adding specific guidance exactly where it is needed. A monorepo with a Python backend and a React frontend does not need to cram both sets of conventions into a single file.

What NOT to Put in Project Memory

The most common mistake is writing too much. Project memory is loaded into the agent's context window at the start of every session. Every line consumes tokens — the limited resource that determines how much the agent can process in a single session.

Do not duplicate your README. Your README is for humans browsing GitHub. Project memory is for agents working in your code. There is overlap, but they serve different audiences.

Do not write a novel. A 500-line configuration file means the agent has 500 fewer lines of capacity for your actual task. Be ruthless about conciseness.

Do not document obvious things. You do not need to tell the agent to "write clean code" or "handle errors properly." It already knows these things. Focus on what is specific to your project.

Do not include temporary instructions. "Fix the bug in auth.ts" is a task, not a convention. Keep project memory focused on durable knowledge that will be true next month.

The 80/20 Rule

Eighty percent of the value from project memory comes from twenty percent of the content. Specifically:

  1. A clear project overview (1-2 sentences) — this anchors everything else
  2. Three to four key conventions — the ones you would correct most often without them
  3. A build/test command reference — so the agent can verify its own work

If you write nothing else, write these. A 10-line file covering these basics will outperform a 200-line file that tries to document everything but buries the important bits in noise.

Evolving Your Config Over Time

Do not try to write the perfect project memory file upfront. You cannot anticipate what the agent will get wrong. Instead:

  1. Start minimal. Write a 5-10 line file with your project overview and the conventions you care about most.
  2. Use the agent. Give it real tasks on your project.
  3. Notice the corrections. When you find yourself telling the agent the same thing twice, that is a signal to add it to your config.
  4. Add one thing at a time. After each session, add the single most impactful instruction you wish the agent had known.
  5. Prune ruthlessly. If an instruction is not earning its token cost, remove it.

This iterative approach produces a config that reflects your actual needs rather than your assumptions about what an agent might need.

The Aspirational Trap

A subtle but damaging mistake is writing instructions that describe how you wish your project worked rather than how it actually works. If your codebase is full of class components but you write "Always use functional components with hooks," the agent will produce code that is stylistically inconsistent with everything around it.

Write instructions for the project you have, not the project you want. If you are migrating from one pattern to another, say so explicitly: "New components should use functional components with hooks. Existing class components in src/legacy/ should not be refactored without explicit instruction."

Context Windows and Token Budgets

Project memory is always loaded — it consumes context window space whether or not it is relevant to the current task. This is by design: the agent needs your conventions for every task. But it means that a bloated configuration file directly reduces the agent's capacity for complex work.

A rough guideline: keep your root-level project memory file under 100 lines. If you need more than that, use the directory-level hierarchy to put specialized instructions closer to where they are needed. This way, the agent only loads the subset of instructions relevant to the part of the codebase it is working in.

Key Takeaways

  • Every agent session starts cold. Project memory is how you give the agent persistent knowledge about your codebase.
  • Start with 5-10 lines, not 200. A concise file that covers your project overview and top conventions delivers most of the value.
  • Add instructions reactively. When you correct the agent twice for the same thing, add it to your config.
  • Use the hierarchy. Repo-level for global conventions, directory-level for local overrides.
  • Keep it honest. Document what your project actually does, not what you wish it did.

title: "Project Memory — Claude Code" tested_with: claude-code: "1.0.x" last_updated: 2026-03-21 status: proven difficulty: beginner prerequisites: [01-your-first-hour]

Project Memory in Claude Code

Claude Code uses CLAUDE.md files as its project memory system. These files are loaded automatically at the start of every session, giving the agent persistent context about your project's architecture, conventions, and workflows without you repeating yourself.

Where CLAUDE.md Lives

CLAUDE.md can exist at multiple levels in your project:

  • Project root (./CLAUDE.md) — loaded in every session. This is your primary configuration file.
  • Subdirectories (./src/api/CLAUDE.md) — loaded when the agent works in that directory. Use these for module-specific conventions.
  • Home directory (~/.claude/CLAUDE.md) — loaded in every project. Use this sparingly, for personal preferences that apply everywhere.

Note: The filename must be exactly CLAUDE.md (uppercase). Claude Code will not recognize claude.md or Claude.md.

How Claude Code Loads It

When you start a session, Claude Code automatically:

  1. Reads CLAUDE.md from your project root
  2. Reads any CLAUDE.md files in the current working directory and its parents
  3. Reads ~/.claude/CLAUDE.md if it exists
  4. Merges all instructions, with more specific files taking precedence

You do not need to reference, import, or include these files. They are loaded silently and automatically. The agent follows the instructions without you needing to mention them in your prompts.

The Anatomy of a Great CLAUDE.md

A well-structured CLAUDE.md has five sections. Not every project needs all five, but this is the framework to build from.

Project Overview

One to two sentences of technical context. This is not your elevator pitch — it is the minimum an agent needs to understand what it is working on.

## Project Overview

SaaSKit is a Next.js 14 multi-tenant SaaS starter using App Router, Drizzle ORM with Postgres, and Stripe for billing. The app serves ~2000 DAU in production.

Architecture

Key directories and what lives in them. Focus on navigation — where should the agent look for things?

## Architecture

- `src/app/` — Next.js App Router pages and layouts
- `src/components/` — Shared React components (barrel exports via index.ts)
- `src/lib/` — Business logic, utilities, and third-party wrappers
- `src/db/` — Drizzle schema, migrations, and seed data
- `src/api/` — tRPC routers and procedures
- `tests/` — Vitest unit and integration tests (mirrors src/ structure)

Conventions

The rules that make your code look like it belongs in your project. This section prevents the most corrections.

## Conventions

- TypeScript strict mode — no `any` types, no `@ts-ignore`
- Use named exports, not default exports
- React components: functional components with arrow syntax
- State management: Zustand stores in `src/stores/`, never prop drill more than 2 levels
- API calls: always go through tRPC hooks, never raw fetch
- Error handling: use Result types from `src/lib/result.ts`, not try/catch in business logic
- File naming: kebab-case for files, PascalCase for components

Common Tasks

How to build, test, and deploy. The agent uses these to verify its work.

## Common Tasks

- **Dev server**: `pnpm dev` (port 3000)
- **Run tests**: `pnpm test` (Vitest, runs in ~30s)
- **Run single test**: `pnpm test -- path/to/test.ts`
- **Type check**: `pnpm typecheck`
- **Lint**: `pnpm lint` (ESLint + Prettier)
- **DB migrations**: `pnpm db:push` (development), `pnpm db:migrate` (production)
- **Seed data**: `pnpm db:seed`

What NOT to Do

Anti-patterns specific to your project. This section is surprisingly high-value — it prevents the agent from making mistakes your team has already learned from.

## What NOT to Do

- Do NOT use the `prisma` package — we migrated to Drizzle in Q3. The old Prisma schema still exists but is not used.
- Do NOT modify files in `src/legacy/billing/` — these are being deprecated and have complex Stripe webhook dependencies.
- Do NOT use `className` strings directly — always use the `cn()` utility from `src/lib/utils.ts`.
- Do NOT add new environment variables without adding them to `src/env.ts` (Zod validation).

Starter CLAUDE.md Template

Copy this into your project root and customize it. This takes five minutes and delivers immediate value.

# Project Instructions

## Project Overview

<!-- One to two sentences: what is this project, what tech stack, any key context. -->

## Architecture

<!-- List 4-6 key directories and their purpose. -->

## Conventions

<!-- List 3-5 coding conventions that matter most in this project. -->

## Common Tasks

- **Install**: `<!-- your install command -->`
- **Dev server**: `<!-- your dev command -->`
- **Run tests**: `<!-- your test command -->`
- **Lint**: `<!-- your lint command -->`

## What NOT to Do

<!-- List 2-3 project-specific anti-patterns. -->

Intermediate Example: Real-World CLAUDE.md

Here is a more complete example showing what a CLAUDE.md looks like after a few weeks of iterative refinement:

# Project Instructions

## Project Overview

Conveyor is an internal tool for managing our ML model deployment pipeline. Python 3.12, FastAPI backend, React 18 frontend in a monorepo. ~15 engineers actively contributing.

## Architecture

- `api/` — FastAPI application (routers in `api/routers/`, services in `api/services/`)
- `api/models/` — SQLAlchemy ORM models (Postgres)
- `api/schemas/` — Pydantic request/response schemas (separate from ORM models)
- `web/` — React SPA (Vite, TanStack Router, TanStack Query)
- `web/src/components/` — Shared components using Radix UI primitives
- `infra/` — Pulumi IaC (Python) for AWS deployment
- `scripts/` — Development and CI helper scripts
- `tests/` — Pytest tests mirroring api/ structure; Vitest tests in web/

## Conventions

- Python: Ruff for formatting and linting, strict type hints on all public functions
- Python imports: use absolute imports from project root, never relative
- FastAPI: every router function must have explicit response_model
- Pydantic schemas: always inherit from our BaseSchema in `api/schemas/base.py`
- React: use TanStack Query for all server state, Zustand for client state only
- React components: named exports, co-locate styles using CSS modules
- Git: conventional commits (feat:, fix:, chore:, etc.)
- All database queries go through service layer, never directly in routers
- Test naming: `test_{function_name}_{scenario}_{expected_result}`

## Common Tasks

- **API dev**: `cd api && uvicorn main:app --reload`
- **Frontend dev**: `cd web && pnpm dev`
- **Run all tests**: `./scripts/test.sh` (runs both Python and JS tests)
- **Run Python tests**: `cd api && pytest`
- **Run frontend tests**: `cd web && pnpm test`
- **Type check Python**: `cd api && mypy .`
- **DB migration**: `cd api && alembic revision --autogenerate -m "description"` then `alembic upgrade head`
- **Seed dev data**: `cd api && python -m scripts.seed`

## What NOT to Do

- Do NOT use `requests` library — use `httpx` (async support)
- Do NOT add dependencies without checking `pyproject.toml` for existing alternatives
- Do NOT use raw SQL — always go through SQLAlchemy ORM
- Do NOT create new Pydantic models without inheriting BaseSchema
- Do NOT put business logic in router functions — it belongs in the service layer
- Do NOT use `useEffect` for data fetching — use TanStack Query hooks

Advanced Patterns

Directory-Level Overrides

Place a CLAUDE.md in a subdirectory to add context specific to that part of the codebase:

<!-- File: src/billing/CLAUDE.md -->

## Billing Module

This module handles Stripe integration. Key constraints:

- All Stripe API calls go through `stripe_client.py` — never instantiate a Stripe client directly
- Webhook handlers in `webhooks.py` must be idempotent — Stripe retries on failure
- Use `Decimal` for all monetary amounts, never `float`
- Test with Stripe test mode keys only — see `.env.test` for configuration

This keeps your root CLAUDE.md lean while giving the agent deep context exactly where it needs it.

Conditional Instructions

When your project has modes or environments that change behavior:

## Environment-Specific Notes

- When writing tests: use the factories in `tests/factories/` to generate test data, never create objects manually
- When modifying the database schema: always create an Alembic migration, never modify models without one
- When working in the `infra/` directory: be extremely conservative, explain changes before making them

The .claude/ Directory

Claude Code stores project-level settings in a .claude/ directory:

  • .claude/settings.json — project-specific settings like allowed/denied tool permissions. This file can be committed to your repo so the entire team shares the same agent permissions.
  • .claude/settings.local.json — personal settings that override project settings. This file should be in .gitignore.

Note: The .claude/ directory is for Claude Code settings, not for your instructions. Your instructions go in CLAUDE.md.

Testing Your CLAUDE.md

After creating or updating your CLAUDE.md, verify it is working:

  1. Start a new session (old sessions do not reload CLAUDE.md changes).
  2. Ask the agent about your conventions:
    What conventions should you follow when writing code in this project?
    
  3. The agent should echo back your conventions. If it gives generic answers, your CLAUDE.md is not being loaded — check the filename and location.
  4. Run a small task and check whether the output follows your conventions without prompting.

This simple test catches the most common issues: wrong filename, wrong directory, or instructions that are too vague for the agent to act on.

Quick Reference

WhatWhere
Root instructions./CLAUDE.md
Directory instructions./path/to/dir/CLAUDE.md
Personal global instructions~/.claude/CLAUDE.md
Project settings.claude/settings.json
Personal settings.claude/settings.local.json

title: "Project Memory — Codex CLI" tested_with: codex-cli: "0.2.x" last_updated: 2026-03-21 status: proven difficulty: beginner prerequisites: [01-your-first-hour]

Project Memory in Codex CLI

Codex CLI uses AGENTS.md as its primary project memory system. AGENTS.md is an open standard — not proprietary to any single tool — designed to give AI coding agents the context they need about your project.

AGENTS.md: The Open Standard

AGENTS.md emerged as a community-driven convention and is now under Linux Foundation stewardship. Over 60,000 repositories have adopted the format, making it the most widely supported project memory standard in the ecosystem.

The core idea is simple: a Markdown file in your repository that any AI coding agent can read to understand your project's conventions, architecture, and workflows. While Codex CLI has first-class support, the format is intentionally tool-agnostic — other agents and tools can (and do) consume it.

This matters for two reasons. First, your investment in writing AGENTS.md is portable. If you switch tools or add new ones, your project memory travels with you. Second, the standard evolves based on input from a broad community of practitioners, not a single vendor's roadmap.

Where AGENTS.md Lives and How Codex Loads It

AGENTS.md follows the same hierarchical pattern described in the concepts module:

  • Project root (./AGENTS.md) — loaded in every session. This is your primary configuration file.
  • Subdirectories (./src/api/AGENTS.md) — loaded when the agent works in or references that directory. Use these for module-specific guidance.

When you start a Codex CLI session, the tool automatically:

  1. Reads AGENTS.md from your project root
  2. Discovers and reads any AGENTS.md files in relevant subdirectories as the agent navigates your codebase
  3. Merges instructions, with more specific (deeper) files taking precedence over general (root) ones

No configuration flag or import statement is needed. If the file exists and is named correctly, Codex reads it.

Note: The filename must be exactly AGENTS.md (uppercase). Codex CLI will not recognize agents.md or Agents.md.

Anatomy of a Good AGENTS.md

The structure closely mirrors what works for any project memory file. A solid AGENTS.md covers these sections:

Project Overview

Ground the agent with a brief technical description.

## Project Overview

RecipeBox is a Django 5.0 recipe sharing platform with a React Native mobile app. Monorepo managed with Nx. PostgreSQL database, Redis for caching, Celery for background jobs.

Architecture

Map the territory for the agent.

## Architecture

- `apps/api/` — Django REST Framework application
- `apps/mobile/` — React Native (Expo) mobile app
- `libs/shared/` — Shared TypeScript types and utilities
- `libs/db/` — Database models, migrations, and fixtures
- `tools/` — Nx generators and CI scripts

Conventions

Define the rules that keep generated code consistent with your codebase.

## Conventions

- Python: Black formatter, isort for imports, type hints required on all public functions
- Django: fat models, thin views — business logic lives in model methods and managers
- React Native: functional components only, use React Navigation for routing
- All API endpoints must have OpenAPI docstrings
- Test files live next to the code they test: `foo.py` → `foo_test.py`

Common Tasks

Give the agent the commands it needs to build, test, and verify its work.

## Common Tasks

- **Install all**: `npm install` (Nx handles workspace dependencies)
- **API dev server**: `nx serve api`
- **Mobile dev**: `nx start mobile`
- **Run all tests**: `nx run-many --target=test`
- **Run API tests**: `nx test api`
- **Lint**: `nx run-many --target=lint`
- **DB migrations**: `cd apps/api && python manage.py makemigrations && python manage.py migrate`

What NOT to Do

Prevent the agent from repeating known mistakes.

## What NOT to Do

- Do NOT use Django's `JSONField` — use our typed JSON column wrapper in `libs/db/fields.py`
- Do NOT install packages globally — always scope to the correct Nx project
- Do NOT use `any` in TypeScript shared libraries
- Do NOT write raw SQL — use Django ORM querysets

AGENTS.md vs. CLAUDE.md: Similarities and Differences

If you work with both Codex CLI and Claude Code, you may wonder how these two files relate.

AspectAGENTS.mdCLAUDE.md
Used byCodex CLI (and other tools adopting the standard)Claude Code
StandardOpen, Linux Foundation stewardshipProprietary to Claude Code
File formatMarkdownMarkdown
Hierarchy supportRoot + subdirectoriesRoot + subdirectories + home directory
Content structureIdentical in practiceIdentical in practice
Community adoption60K+ reposGrowing, primarily Claude Code users

The content you put in both files is effectively the same: project overview, architecture, conventions, tasks, anti-patterns. The difference is which tool reads which file.

Note: Both files can coexist in the same repository. If your team uses both Codex CLI and Claude Code, maintain both files. See the section on coexistence below.

The .codex/ Directory

Codex CLI uses a .codex/ directory for project-level configuration:

  • .codex/config.yaml — project-wide settings including default model, approval mode, and environment variables the agent can access.
  • .codex/config.local.yaml — personal overrides (add to .gitignore).

A typical .codex/config.yaml:

# Adapt this by: setting your preferred model and approval mode.
model: o4-mini
approval_mode: suggest

Available approval modes control how much autonomy the agent gets:

ModeBehavior
suggestAgent proposes changes, you approve each one
auto-editAgent can edit files automatically but asks before running commands
full-autoAgent edits files and runs commands without asking

Warning: full-auto mode runs commands in a sandboxed environment, but you should still understand what the agent is doing. Start with suggest or auto-edit until you are comfortable with the agent's behavior on your project.

Codex-Specific Configuration

Beyond AGENTS.md, Codex CLI supports additional configuration that shapes agent behavior:

Environment Variables

You can specify which environment variables the agent can access in .codex/config.yaml:

# Adapt this by: listing only the variables your agent needs. Never expose secrets unnecessarily.
env:
  NODE_ENV: development
  DATABASE_URL: postgresql://localhost:5432/myapp_dev

Project-Level Instructions in Config

For short instructions that do not warrant a full AGENTS.md file, you can embed them directly in the config:

# Adapt this by: using this for brief, project-wide instructions only.
instructions: "Always run tests after making changes. Use conventional commit messages."

For anything longer than a sentence or two, use AGENTS.md instead — it is easier to read, review, and version.

Starter AGENTS.md Template

Copy this into your project root and fill in the details. Five minutes of effort, sessions of payoff.

# AGENTS.md

## Project Overview

<!-- One to two sentences: what is this project, what tech stack, any key context. -->

## Architecture

<!-- List 4-6 key directories and their purpose. -->

## Conventions

<!-- List 3-5 coding conventions that matter most in this project. -->

## Common Tasks

- **Install**: `<!-- your install command -->`
- **Dev server**: `<!-- your dev command -->`
- **Run tests**: `<!-- your test command -->`
- **Lint**: `<!-- your lint command -->`

## What NOT to Do

<!-- List 2-3 project-specific anti-patterns. -->

Coexisting: AGENTS.md and CLAUDE.md in the Same Repo

If your team uses both Codex CLI and Claude Code, you have three options:

Option 1: Maintain Both Files Separately

Keep AGENTS.md and CLAUDE.md at your project root, each with the same content. This is the simplest approach but creates a maintenance burden — you must update both files when conventions change.

Option 2: One Primary, One Minimal

Choose one file as your source of truth (whichever tool your team uses more) and make the other a brief pointer:

<!-- AGENTS.md (if CLAUDE.md is primary) -->
# AGENTS.md

See CLAUDE.md in this directory for full project conventions. This file exists for Codex CLI compatibility.

## Quick Reference

- **Run tests**: `pnpm test`
- **Dev server**: `pnpm dev`

On Unix-like systems, you can symlink one to the other:

# Adapt this by: choosing which filename to keep as the real file.
ln -s CLAUDE.md AGENTS.md

This ensures both files always have identical content. The trade-off is that you cannot include tool-specific instructions in either file, since both tools will read the same content.

Note: For most teams, Option 1 or Option 2 works best. The small duplication cost is usually worth the clarity of having each tool read its own file.

Testing Your AGENTS.md

After creating or updating your AGENTS.md, verify it works:

  1. Start a new Codex CLI session (previous sessions will not pick up changes mid-conversation).
  2. Ask the agent to describe your project conventions:
    What are the coding conventions for this project?
    
  3. Verify the response matches your AGENTS.md. If the agent gives generic advice, check your filename and file location.
  4. Run a small coding task and inspect whether the output follows your conventions without you needing to mention them.

Quick Reference

WhatWhere
Root instructions./AGENTS.md
Directory instructions./path/to/dir/AGENTS.md
Project config.codex/config.yaml
Personal config.codex/config.local.yaml
Inline instructionsinstructions key in config.yaml

title: "Exercises — Project Memory" last_updated: 2026-03-21 status: proven difficulty: beginner prerequisites: [01-your-first-hour]

Exercises: Project Memory

These exercises build your skill with CLAUDE.md and AGENTS.md through hands-on practice on your own project. Each exercise increases in scope, starting with a minimal configuration and ending with an iterative improvement workflow you will use throughout your agentic development practice.

Note: These exercises work with either Claude Code (CLAUDE.md) or Codex CLI (AGENTS.md). The instructions reference both — use whichever tool you have set up.


Exercise 1: The Starter Config

Objective: Create a minimal project memory file and observe its immediate impact on agent behavior.

Steps

  1. Open one of your own projects (not a toy example — use a real codebase you actively work on).

  2. Create a CLAUDE.md or AGENTS.md file in the project root with exactly five lines:

    # Project Instructions
    
    ## Project Overview
    
    <!-- Replace this: one sentence describing your project, its tech stack, and its purpose. -->
    
  3. Start a new agent session and run three different tasks. Choose tasks that represent your typical workflow — for example:

    • "Add input validation to the user registration endpoint"
    • "Write unit tests for the cart total calculation"
    • "Refactor the dashboard component to extract the sidebar into its own file"
  4. For each task, note:

    • Did the agent use the right language and framework?
    • Did it place files in the correct directories?
    • Did it follow your project's naming conventions?
    • What did you have to correct?
  5. Write down the three corrections you made most often. These become candidates for your next config update.

Expected Outcome

Even a one-sentence project overview noticeably improves the agent's output — it will use the right framework, language, and general patterns. You will also have a concrete list of corrections to address in Exercise 2 and beyond.

Hints

  • If you are unsure what to write for your overview, answer this: "If a new developer asked what this project is, what would you say in one breath?"
  • Do not overthink the five lines. The point is to start small and observe the effect, not to write a perfect config on the first try.

Exercise 2: The Before/After Test

Objective: Measure the concrete impact of project memory by comparing agent output with and without configuration.

Steps

  1. Choose a single, well-defined coding task for your project. Pick something that involves your project's specific conventions — for example, adding a new API endpoint, creating a new React component, or writing a database migration. The task should take the agent 2-5 minutes.

  2. Run the task WITHOUT project memory. If you already have a CLAUDE.md or AGENTS.md, temporarily rename it:

    # Adapt this by: using the filename that matches your tool (CLAUDE.md or AGENTS.md).
    mv CLAUDE.md CLAUDE.md.bak
    

    Start a fresh agent session and give the task. Save the output (copy the generated code to a file like output-before.txt).

  3. Restore and enhance your project memory. Rename the file back and add a detailed configuration. Include at minimum:

    • Project overview (1-2 sentences)
    • Architecture (4-6 key directories)
    • Conventions (3-5 rules)
    • Common tasks (build, test, lint commands)
    mv CLAUDE.md.bak CLAUDE.md
    
  4. Run the same task WITH project memory. Start a fresh session (important — do not reuse the old session) and give the exact same task prompt. Save this output to output-after.txt.

  5. Compare the results side by side. For each output, evaluate:

    • Correct file placement (right directory)?
    • Correct naming convention (matches your project)?
    • Correct patterns used (right libraries, right abstractions)?
    • Code style consistency (looks like it belongs in your codebase)?
    • Number of corrections needed before the code is merge-ready?

Expected Outcome

The "after" output should require significantly fewer corrections. Common improvements include: correct import style, right state management library, proper file naming, correct test patterns, and adherence to your project's architectural boundaries. If the difference is subtle, your conventions may be close to standard patterns — try a task that involves a project-specific pattern.

Hints

  • Use the exact same prompt for both runs. Even small wording changes can affect output.
  • If you want a quantitative measure, count the number of lines you would need to change in each output before committing.
  • The biggest improvements usually show up in convention adherence (naming, imports, file placement) rather than in raw correctness.

Exercise 3: The Convention Test

Objective: Verify that the agent reliably follows a specific convention from your project memory file.

Steps

  1. Choose a concrete, verifiable coding convention and add it to your CLAUDE.md or AGENTS.md. Pick something unambiguous that you can check by reading the code. Good examples:

    • "Always use arrow functions for React components"
    • "Never use abbreviations in variable names — use customerAddress, not custAddr"
    • "All Python functions must have type hints for parameters and return values"
    • "Use vi.fn() for mocks in tests, never jest.fn()"
    • "Always add JSDoc comments to exported functions"

    Add your chosen convention to the Conventions section:

    ## Conventions
    
    - Always use arrow functions for React component definitions (e.g., `const MyComponent = () => { ... }`)
    
  2. Start a new agent session.

  3. Ask the agent to write code that would naturally involve your convention. For example, if your convention is about arrow functions for React components, ask it to create two or three new components. If it is about variable naming, ask it to write a function that handles several data objects.

  4. Inspect every instance in the output where the convention should apply. Check:

    • Did the agent follow the convention consistently?
    • Did it follow the convention in all instances, or only some?
    • Did it follow the convention without you mentioning it in the prompt?
  5. If the agent did not follow the convention, revise your wording. Common fixes:

    • Make the instruction more specific (before: "use good variable names"; after: "use full English words for variable names, never abbreviations")
    • Add an example of the correct pattern
    • Add an example of the incorrect pattern with "do NOT" prefix
  6. Repeat the test with the revised wording until the agent follows it reliably.

Expected Outcome

The agent should follow your convention in the majority of cases (80%+ compliance is realistic). If compliance is low, the issue is almost always that the instruction is too vague. Iterating on the wording is a core project memory skill — you are learning not just to configure the agent, but to write instructions that translate reliably into behavior.

Hints

  • Conventions that include a concrete example ("use const Foo = () => {}, not function Foo() {}") have higher compliance than abstract rules.
  • If the agent follows the convention 3 out of 4 times, the instruction is probably fine — occasional misses are normal. If it follows it 1 out of 4 times, the instruction needs rewriting.
  • Test one convention at a time. If you add five conventions at once, you will not know which one the agent is struggling with.

Exercise 4: The Progressive Build

Objective: Build a high-quality project memory file over five sessions using the iterative approach from the concepts module.

Steps

This exercise spans five separate agent sessions over one to two weeks. Do not try to complete it in a single sitting.

Session 1: Baseline

  1. Start with a minimal CLAUDE.md or AGENTS.md (5-10 lines — your project overview and one or two conventions).
  2. Use the agent for a real task on your project.
  3. After the session, write down every correction you made. Pick the single most impactful one.
  4. Add that correction as a new instruction in your config.

Session 2: First Iteration

  1. Start a new session with your updated config.
  2. Use the agent for a different real task.
  3. Note: did the agent avoid the mistake from Session 1? (It should.)
  4. Write down new corrections. Add the most impactful one to your config.

Session 3: Architecture Pass

  1. By now, you likely have a feel for what the agent gets right and wrong about your project's structure. Add an Architecture section with your key directories if you have not already.
  2. Use the agent for a task that involves navigating multiple parts of the codebase.
  3. Note corrections. Add the most impactful one to your config.

Session 4: Anti-Patterns Pass

  1. Review your accumulated corrections. Are there any "do NOT" instructions that would have prevented multiple mistakes? Add a What NOT to Do section.
  2. Use the agent for a task that might trigger one of your anti-patterns.
  3. Note whether the anti-pattern instructions prevented the mistake.

Session 5: Pruning

  1. Read your config file end to end. Are there instructions that are redundant, too vague, or not earning their token cost?
  2. Remove or consolidate any low-value instructions.
  3. Use the agent for your most common type of task. Evaluate the overall quality of the output.
  4. Compare: how does the agent's output in Session 5 compare to Session 1?

Tracking: After each session, record these metrics in a simple log:

<!-- Adapt this by: replacing with your actual observations after each session. -->
| Session | Config Lines | Task | Corrections Needed | New Instruction Added |
|---------|-------------|------|--------------------|-----------------------|
| 1       | 5           |      |                    |                       |
| 2       |             |      |                    |                       |
| 3       |             |      |                    |                       |
| 4       |             |      |                    |                       |
| 5       |             |      |                    |                       |

Expected Outcome

By Session 5, you should see a measurable reduction in corrections per session. A typical trajectory: Session 1 requires 5-8 corrections; Session 5 requires 1-2. Your config file should be 20-40 lines — detailed enough to be effective, concise enough to not waste context window space. You will also have internalized the iterative improvement loop: use, observe, add, prune.

Hints

  • Real sessions only. Do not invent fake tasks for this exercise. The value comes from discovering what your specific project needs, which you can only learn from actual work.
  • One instruction per session keeps the signal clear. If you add five instructions at once, you cannot tell which one helped.
  • The pruning session (Session 5) is the most important. The goal is not a long config file — it is an effective one. A 25-line file with high-impact instructions beats a 75-line file with filler.
  • If you share the project with teammates, ask them to try the same exercise. Their corrections will be different from yours, and combining perspectives produces a stronger config.

title: "Prompting for Agents" last_updated: 2026-03-21 status: proven difficulty: beginner prerequisites: [02-project-memory]

Prompting for Agents

The Core Question: How Do I Give Instructions That Work?

Every developer who starts working with coding agents hits the same wall. You type something into the prompt, the agent does... something, and you think: "That's not what I meant." The gap between what you intended and what the agent produced is the prompting gap, and closing it is the single most important skill in agentic development.

The good news: prompting is not mystical. It is a practical skill with learnable patterns. You do not need to memorize magic phrases or speak in a special syntax. You need to understand how agents interpret instructions and adjust your communication style accordingly.

This module teaches you how to give instructions that consistently produce the results you want.

Why Agent Prompting Is Different from Chat Prompting

When you prompt a chatbot, the worst outcome is a bad answer. You read it, shrug, and try again. When you prompt a coding agent, the stakes are fundamentally different: the agent takes actions. It edits files, runs commands, creates directories, installs packages. A misunderstood instruction does not produce a bad paragraph — it produces changed code that you now need to understand and potentially undo.

This difference has three practical consequences:

  1. Clarity matters more. Ambiguity in a chat prompt wastes a few seconds. Ambiguity in an agent prompt can send the agent down a long wrong path, editing multiple files before you realize something went sideways.

  2. Scope matters more. Asking a chatbot to "explain everything about React" just gives you a long answer. Asking an agent to "refactor everything in the project" can trigger sweeping changes that are hard to review.

  3. Context matters more. A chatbot can ask clarifying questions cheaply. An agent that lacks context might make reasonable-sounding but wrong assumptions and act on them immediately.

The upside: agents are also more forgiving than you might expect. They can read your codebase, look up function signatures, and figure out many details on their own. The skill is knowing what to specify and what to let the agent discover.

The Task Decomposition Principle

Complex work breaks down into steps. This is true for humans and it is true for agents. The difference is that agents handle decomposition better when you do the high-level breakdown and let them handle the low-level details.

Consider the task "add user authentication to this app." That is not one task — it is at least five:

  1. Choose and install an auth library
  2. Create a user model and database migration
  3. Build login and registration endpoints
  4. Add middleware to protect existing routes
  5. Write tests for the auth flow

Each of those is a well-scoped agent task. The full request is not. When you give an agent a task that is too large, it has to make many sequential decisions, and the probability of drifting from your intent compounds with each decision.

Decomposition is not about micromanaging. You are not dictating every line of code. You are drawing boundaries around coherent chunks of work so the agent can focus, produce reviewable output, and get your feedback before moving to the next chunk.

The Intent vs. Instruction Spectrum

There is a spectrum between telling an agent what you want (intent) and telling it exactly how to do it (instruction). The sweet spot is closer to intent than most developers initially expect.

Too much intent (vague):

"Make the API better."

The agent has no idea what "better" means to you. Faster? More endpoints? Better error handling? Different response format?

Too much instruction (prescriptive):

"Open src/api/users.ts, go to line 47, change the if statement to use a ternary operator, then on line 52 rename the variable from 'res' to 'response', then..."

You are doing the agent's job for it, badly. The agent is better at navigating code than you are at dictating line-by-line changes from memory.

The sweet spot (intent with context):

"The user API endpoint in src/api/users.ts returns generic 500 errors. Refactor the error handling to return specific HTTP status codes and helpful error messages."

This tells the agent what is wrong, where to look, and what the outcome should look like — without dictating the implementation. The agent can read the file, understand the current error handling, and choose the right approach.

The principle: specify the what and the why. Let the agent figure out the how.

Context Management

Agents operate within a context window — a finite amount of text they can consider at once. Managing this context is a practical skill.

What to include in your prompt:

  • The specific goal of the task
  • Which files or areas of the codebase are relevant
  • Any constraints ("don't change the public API," "use the existing database schema")
  • Examples of similar patterns in the codebase ("follow the pattern in src/api/orders.ts")

What to omit:

  • General background the agent can find by reading files (it can look at your package.json itself)
  • Implementation details you are not sure about (let the agent decide)
  • Long explanations of how the codebase works (point to files instead)

What the agent can find on its own:

  • File contents, project structure, dependency lists
  • Function signatures, type definitions, existing tests
  • Configuration files, build scripts, existing patterns

A common mistake is pasting large blocks of code into your prompt. Instead, reference the file: "Look at the validateUser function in src/auth/validation.ts." The agent will read it with full fidelity. Your pasted code might be stale, truncated, or missing context.

The "Good Enough" First Prompt

Do not spend ten minutes crafting the perfect prompt. Spend thirty seconds writing a reasonable one, send it, and see what happens.

Agents are interactive. Your first prompt starts a conversation, not a contract. If the result is 80% right, a quick correction gets you to 100% faster than trying to anticipate every edge case in your initial prompt.

This is counterintuitive for developers who are used to writing detailed specifications. With agents, iteration is cheaper than specification. A good-enough prompt followed by two rounds of feedback almost always outperforms a meticulously engineered prompt with no feedback.

The practical workflow:

  1. Write a clear but brief prompt
  2. Review the result
  3. Give specific feedback on what to change
  4. Repeat until satisfied

This loop is fast — usually faster than the time you would spend trying to write the "perfect" prompt upfront.

Prompt Anti-Patterns

Three patterns consistently produce poor results:

Being Too Vague

"Make it better." "Fix the bugs." "Clean this up."

The agent does not share your mental model. "Better" to you might mean "faster," "more readable," "more secure," or something else entirely. Always specify what dimension of improvement you care about.

Being Too Prescriptive

"On line 34, change the variable name from x to count. Then on line 35..."

This defeats the purpose of using an agent. If you know exactly what changes to make at the character level, you are faster making them yourself. Agents excel when you describe the problem and let them find the solution.

Context Overload

[Three pages of background, architecture decisions, team conventions, historical context, and tangentially related requirements]

Long prompts bury the actual task in noise. The agent may lose focus on what you actually want. Keep it tight: goal, location, constraints. If you need to convey detailed project conventions, put them in a CLAUDE.md or codex.md file where they persist across sessions (see Module 02).

The Feedback Loop

The most underused prompting technique is feedback. After the agent produces its first result, you have a conversation:

"This is close, but the error messages should include the field name that failed validation."

"Good, but you added a new dependency. Use the built-in Node.js crypto module instead."

"The logic is right but the function is too long. Extract the validation into a separate helper."

Each round of feedback is a prompt in itself, and it benefits from the same principles: be specific, state what you want, and reference concrete details. The agent retains full context of the conversation, so your feedback compounds. By round three, the agent has a much richer understanding of your intent than any single prompt could convey.

When to Start a New Session

Sessions have memory within them but not across them (unless you use project memory files). Over a long session, the context window fills up and the agent may lose track of earlier details.

Start a new session when:

  • You are switching to an unrelated task
  • The conversation has gone on for many turns and the agent seems to be losing coherence
  • You want a fresh perspective on a problem (the agent's earlier wrong assumptions might be anchoring it)
  • The context window is getting full and the tool suggests compacting

Continue the current session when:

  • You are iterating on the same task
  • The agent has built up useful context about your code that would be expensive to re-establish
  • You are in the middle of a multi-step workflow

A practical heuristic: if you would context-switch as a human, context-switch as an agent user too.

Scoping: The Sweet Spot for Task Size

Tasks that are too small create overhead. If you ask the agent to rename a single variable, you spent more time writing the prompt and reviewing the result than just making the change yourself.

Tasks that are too large create drift. If you ask the agent to "build the entire backend," it will make hundreds of decisions without your input, and the result will likely diverge from what you wanted.

The sweet spot is a task that:

  • Has a clear, testable outcome ("the tests pass," "the endpoint returns the right data," "the component renders correctly")
  • Touches a bounded area of code (one file, one module, one feature)
  • Can be reviewed in a few minutes
  • Takes the agent one to five minutes to complete

This typically maps to things like: fix a specific bug, add one endpoint, refactor one function, write tests for one module, update one component. These are the daily building blocks of development work, and they are exactly what agents handle best.

Key Takeaways

  1. Agent prompts have higher stakes than chat prompts because agents take actions, not just generate text. Clarity and scope matter.

  2. Decompose complex work into agent-sized subtasks. Let the agent handle implementation details within each subtask.

  3. Specify intent, not instructions. Tell the agent what you want and why. Let it figure out how.

  4. Manage context deliberately. Include the goal, location, and constraints. Omit what the agent can find on its own.

  5. Iterate, do not over-engineer. A good-enough first prompt plus two rounds of feedback beats a perfect prompt with no feedback.

  6. Avoid the extremes of too vague, too prescriptive, or too much context.

  7. Use feedback actively. Each correction sharpens the agent's understanding of your intent within the session.

  8. Right-size your tasks. Not so small that prompting is overhead, not so large that the agent drifts.

The developers who get the most out of coding agents are not the ones who write the cleverest prompts. They are the ones who communicate clearly, scope work well, and iterate fast.


title: "Prompting for Agents — Claude Code" tested_with: claude-code: "1.0.x" last_updated: 2026-03-21 status: proven difficulty: beginner prerequisites: [02-project-memory]

Prompting for Agents — Claude Code

This guide covers prompting techniques specific to Claude Code. For general prompting principles, see concepts.md in this module.

Plan Mode

Plan mode tells Claude Code to think through a task before executing it. Instead of immediately editing files and running commands, the agent produces a structured plan that you can review, modify, or approve.

When to Use Plan Mode

Use plan mode when:

  • The task is complex and touches multiple files or systems
  • You want to understand the agent's approach before it starts making changes
  • You are unfamiliar with the area of code being modified
  • The task has multiple valid approaches and you want to choose one
  • You are working on something risky (database migrations, auth changes, deployment configs)

Skip plan mode when:

  • The task is straightforward and well-scoped ("fix the typo in the error message")
  • You have high confidence in what needs to happen
  • You are iterating on a previous result and the direction is already established

How to Use Plan Mode

Activate plan mode by including the word "plan" in your prompt or by pressing Shift+Tab to toggle into plan mode before sending:

Plan how to add rate limiting to the API endpoints in src/api/

Claude Code will respond with a numbered plan: which files it will examine, what changes it will make, and in what order. You can then:

  • Approve the plan: "Looks good, go ahead."
  • Modify the plan: "Skip step 3. For step 4, use the existing Redis connection instead of creating a new one."
  • Reject and redirect: "Actually, I want to use middleware instead of per-endpoint logic. Replan."

Plan mode is especially valuable early in a project when you are still establishing patterns. Once patterns are established and documented in your CLAUDE.md, you can rely on direct execution more often.

Slash Commands

Claude Code provides built-in slash commands that control the tool's behavior. These are not prompts to the agent — they are commands to the interface.

Essential Commands

  • /help — Show available commands and usage information. Use this when you forget a command or want to explore capabilities.

  • /clear — Reset the conversation. All context from the current session is discarded. Use this when switching tasks or when the conversation has become too long and unfocused.

  • /compact — Compress the conversation history to reclaim context window space. The agent summarizes prior turns into a condensed form. Use this when you are deep into a long session and want to keep going without starting fresh. Unlike /clear, the agent retains a summary of what happened.

  • /model — Switch the underlying model. Useful when you want to try a different model for a particular task or balance cost vs. capability.

When to Use /compact vs. /clear

Use /compact when the earlier conversation context is still relevant — you are mid-task and need to continue, but the context window is getting full. The agent preserves the essence of what you discussed.

Use /clear when you are starting something genuinely new. Carrying old context into a new task adds noise and can bias the agent's approach.

Multi-Turn Conversations

Claude Code retains full context within a session. This means your second prompt builds on the first, your third builds on the second, and so on. This is powerful but requires a different approach than single-shot prompting.

Building on Previous Work

After the agent completes a task, you can refine it without restating the full context:

Turn 1: "Add input validation to the createUser function in src/api/users.ts"

Turn 2: "Good. Now add the same validation pattern to updateUser in the same file."

Turn 3: "The email validation is too strict — it's rejecting addresses with plus signs. Fix that in the validation helper you created."

Each turn is short and specific because the agent already knows what you are working on, which files are involved, and what patterns have been established.

The Compound Context Advantage

By turn three, the agent understands:

  • Your codebase structure
  • The validation pattern it created
  • Your preferences (you noticed the plus-sign issue, so you care about edge cases)
  • The specific files and functions involved

This compound context is more valuable than any single prompt could be. It is one reason why multi-turn iteration outperforms one-shot prompting for complex work.

Referencing Files Explicitly

One of the most effective prompting techniques in Claude Code is pointing directly to files and code locations. The agent can read files, but telling it where to look saves time and reduces ambiguity.

Direct File References

Look at src/auth/middleware.ts and fix the bug where expired tokens aren't rejected.
The test in tests/api/users.test.ts is failing on line 42. The assertion expects a 200 but the endpoint now returns 201.
Compare the error handling in src/api/orders.ts with src/api/users.ts. Make users.ts match the orders pattern.

Why This Works

Explicit file references eliminate the agent's search phase. Instead of scanning the project to figure out where the relevant code lives, it goes directly to the right file. This is faster and reduces the chance of the agent finding and modifying the wrong file.

The "Do X, Then Y" Pattern

For multi-step tasks, structure your prompt as a sequence:

First, add a "lastLogin" timestamp field to the User model in src/models/user.ts.
Then update the login endpoint in src/api/auth.ts to set this field on successful login.
Finally, add a test that verifies lastLogin is updated after login.

This pattern works because:

  • Each step has a clear deliverable
  • The steps build on each other logically
  • The agent can complete and verify each step before moving to the next
  • You can review the intermediate results

Keep the sequence to two to four steps. Beyond that, break the work into separate prompts so you can review and course-correct between chunks.

Correcting Course Mid-Task

When the agent produces something that is not quite right, give targeted feedback rather than starting over:

Instead of: "That's wrong, try again."

Say: "The function works but it's modifying the input array in place. Return a new array instead."

Instead of: "I don't like this approach."

Say: "This approach adds a new dependency. Rewrite it using only the standard library."

Instead of: "Start over."

Say: "Keep the test file you created but rewrite the implementation to use a queue instead of polling."

Specific corrections preserve the work that was done right and fix only what needs changing. The agent does not need to re-derive the parts that already met your expectations.

Using Images and Screenshots

Claude Code accepts images in prompts. This is useful for:

  • UI bugs: Paste a screenshot showing the visual problem. "Here is what the dashboard looks like. The chart legend is overlapping the title. Fix the CSS."
  • Design references: Share a mockup. "Implement this design in the ProfileCard component."
  • Error messages: Screenshot a complex error from a browser console or terminal output that is hard to copy as text.

To include an image, drag it into the prompt or paste it from your clipboard. Combine images with text for best results — the image shows the what, your text explains the task.

Effective Prompt Templates

These templates encode the principles from the concepts module into concrete patterns for common tasks.

Bug Fixing

There's a bug where [describe the symptom — what happens vs. what should happen].

The relevant code is in [file path]. [Optionally: I think the issue is in the [function/section] area.]

Fix the bug and add a test that would have caught it.

Example:

There's a bug where users can submit the registration form with an empty email field. The form should show a validation error instead.

The relevant code is in src/components/RegistrationForm.tsx. The validation logic is in the handleSubmit function.

Fix the bug and add a test that verifies empty emails are rejected.

Feature Addition

Add [feature description] to [component/module].

Follow the pattern used in [existing similar feature] for consistency.

[Any constraints: "Don't add new dependencies," "Keep the existing API," etc.]

Example:

Add a "forgot password" flow to the auth module.

Follow the pattern used in the email verification flow in src/api/auth/verify-email.ts for consistency.

Use the existing email service in src/services/email.ts to send the reset link. Don't add new dependencies.

Refactoring

Refactor [file or function] to [goal — what the refactored version should achieve].

Don't change the public API — existing callers should not need to be updated.

[Any specific constraints or preferences.]

Example:

Refactor the processOrder function in src/services/orders.ts to separate validation, pricing, and persistence into distinct helper functions.

Don't change the public API — processOrder should still accept the same arguments and return the same type.

Add JSDoc comments to each new helper function.

Code Review

Review [file or directory] for [specific concerns].

Suggest improvements but don't make changes yet.

Example:

Review src/api/payments.ts for security issues. Specifically check for:
- SQL injection vulnerabilities
- Improper input validation
- Sensitive data in logs
- Missing authentication checks

Suggest improvements but don't make changes yet.

This last template uses plan mode implicitly — by saying "suggest but don't change," you get the agent's analysis without it modifying code. You can then selectively ask it to implement specific suggestions.

Summary

The key Claude Code-specific techniques are:

  1. Plan mode for complex or risky tasks — review the strategy before execution
  2. Slash commands to manage your session — /compact to reclaim space, /clear to reset
  3. Explicit file references to eliminate search time and ambiguity
  4. Sequential steps with the "do X, then Y" pattern for multi-step work
  5. Targeted corrections that preserve good work and fix only what needs changing
  6. Images for UI bugs, design references, and complex errors
  7. Prompt templates that encode good patterns for recurring task types

Combine these with the general principles from the concepts module — intent over instruction, right-sized tasks, iterative feedback — and you will consistently get strong results from Claude Code.


title: "Prompting for Agents — Codex CLI" tested_with: codex-cli: "0.2.x" last_updated: 2026-03-21 status: proven difficulty: beginner prerequisites: [02-project-memory]

Prompting for Agents — Codex CLI

This guide covers prompting techniques specific to Codex CLI. For general prompting principles, see concepts.md in this module.

Approval Modes Shape Your Prompting Strategy

Codex CLI operates in different approval modes, and each mode changes how you should think about prompting. The mode you choose determines how much autonomy the agent has, and that directly affects how precise your prompts need to be.

Suggest Mode

In suggest mode, Codex proposes changes but does not execute them until you approve. Every file edit, every command, every action requires your explicit sign-off.

How this affects prompting: You can afford to be more exploratory. Since nothing happens without your approval, there is less risk in giving broad or experimental prompts.

Explore the codebase and suggest how to improve the error handling across all API endpoints.
What would it take to migrate from Express to Fastify? Show me the changes without making them.

These prompts would be risky in full-auto mode because the agent might start making sweeping changes. In suggest mode, they are safe — you review each proposed change and approve only what makes sense.

Suggest mode is ideal for:

  • Learning a new codebase (ask the agent to explain and propose, then review)
  • Exploratory refactoring (see what the agent suggests before committing to a direction)
  • High-risk changes (database migrations, auth modifications, config changes)
  • When you are not yet confident in the agent's judgment for your project

Auto-Edit Mode

In auto-edit mode, Codex can write files automatically but still requires approval for shell commands. This is a middle ground — file changes happen without friction, but anything that executes (tests, builds, installs) needs your approval.

How this affects prompting: Be clear about the scope of file changes, since those will happen immediately. You still have a safety net for commands.

Update all the TypeScript interfaces in src/types/ to use strict null checks. Then run the type checker to see what breaks.

The file updates happen automatically. When the agent tries to run tsc, you get a chance to review and approve.

Full-Auto Mode

In full-auto mode, Codex executes everything — file edits and commands — without asking. This is the fastest workflow but requires the most precise prompting.

How this affects prompting: Be specific and bounded. State exactly what you want, specify constraints clearly, and scope the task tightly.

Fix the failing test in tests/api/users.test.ts. The test expects a 200 status but the endpoint returns 201. Update the test assertion, don't change the endpoint.

Notice the explicit constraint: "update the test assertion, don't change the endpoint." In full-auto mode, without that constraint, the agent might choose to change the endpoint instead — both are valid fixes, but you have a preference.

Full-auto mode prompting tips:

  • Always specify constraints on what should NOT change
  • Scope tasks to specific files or functions
  • Include your definition of "done" (tests pass, type checks pass, etc.)
  • Avoid open-ended exploration — save that for suggest mode

The Sandbox and What It Means for Prompting

Codex CLI runs in a sandboxed environment that restricts network access and filesystem writes to the project directory. This sandbox protects your system but also constrains what you can ask the agent to do.

What the sandbox allows:

  • Reading and writing files within your project directory
  • Running commands that operate locally (tests, linters, build tools)
  • Using tools and dependencies already installed in your project

What the sandbox restricts:

  • Network requests (no fetching external APIs, no installing new packages from the internet)
  • Writing outside the project directory
  • Accessing system-level resources

How this affects prompting: Do not ask the agent to do things that require network access or system-level operations. If your task requires installing a new package, install it yourself first, then ask the agent to use it.

# This will fail in the sandbox:
"Install lodash and use it to rewrite the utility functions."

# This works (after you install lodash yourself):
"Rewrite the utility functions in src/utils.ts to use lodash. It's already installed."

When the agent encounters a sandbox restriction, it will tell you. Adjust your prompt to work within the constraints, or perform the restricted action yourself and then continue.

Multi-Turn Conversations in Codex

Codex CLI supports multi-turn conversations where each prompt builds on the previous context. This works similarly to Claude Code but with a few differences to keep in mind.

Building Context Across Turns

Turn 1: "Read through the src/api/ directory and summarize what each endpoint does."

Turn 2: "The orders endpoint is missing pagination. Add limit and offset query parameters."

Turn 3: "Good. Now add the same pagination pattern to the products endpoint."

Each turn benefits from the agent's accumulated understanding. By turn three, the agent knows your project structure, the pagination pattern you approved, and which endpoints you care about.

Keeping Sessions Focused

Codex CLI sessions work best when they stay focused on a coherent thread of work. Context accumulates with each turn, and unrelated tangents dilute the agent's focus.

If you find yourself switching topics, consider starting a new session. The cost of re-establishing context is usually less than the cost of an agent that is juggling two unrelated tasks in its memory.

Referencing Specific Files and Functions

Just like with Claude Code, explicit references are one of the most effective prompting techniques:

Look at the validateEmail function in src/utils/validation.ts. It doesn't handle international domain names. Fix it.
The OrderService class in src/services/orders.ts has a calculateTotal method that doesn't account for discounts. Update it to apply the discount rules defined in src/config/pricing.ts.

When you reference files, the agent reads them directly instead of searching the project. This is faster and eliminates the chance of it finding the wrong file.

Referencing Patterns

A powerful technique is pointing to an existing implementation as a template:

Add a DELETE endpoint for products. Follow the same pattern as the DELETE endpoint in src/api/orders.ts, including the soft-delete logic and the audit log entry.

This leverages existing code as a specification. The agent reads the referenced implementation and replicates the pattern, adapted for the new context.

Effective Prompt Patterns for Common Tasks

Bug Fix (Suggest Mode)

There's a race condition in src/services/queue.ts where two workers can pick up the same job. Investigate and suggest a fix.

In suggest mode, the agent will analyze the code, identify the race condition, and propose a fix for your review.

Bug Fix (Full-Auto Mode)

The queue processor in src/services/queue.ts has a race condition on the job claim. Add a database-level lock to ensure only one worker can claim each job. Run the existing tests to verify nothing breaks.

In full-auto mode, the prompt is more specific: it names the fix approach (database-level lock) and includes a verification step (run tests).

Feature Addition

Add a health check endpoint at GET /health that returns a JSON object with:
- status: "ok" or "degraded"
- database: result of a simple query
- uptime: process uptime in seconds

Add it in src/api/health.ts following the route registration pattern in src/api/index.ts.

Test Writing

Write unit tests for the src/services/pricing.ts module. Cover:
- Standard pricing calculation
- Discount application
- Tax calculation
- Edge cases: zero quantity, negative prices, missing product

Put the tests in tests/services/pricing.test.ts. Use the testing patterns from the existing test files.

Code Exploration

I'm new to this codebase. Walk me through how a request flows from the API endpoint in src/api/orders.ts through to the database. What services, middleware, and models are involved?

This is a great use of suggest mode — no changes needed, just analysis.

How Prompting Differs Across Models

Codex CLI can work with different underlying models, and each model has different strengths. Your prompting may need slight adjustments:

More capable models handle ambiguity better. You can be more concise and trust the model to fill in gaps. Complex multi-step tasks work well in a single prompt.

Smaller or faster models benefit from more explicit instructions. Break tasks into smaller steps, be more specific about what you want, and include examples of the desired output format when relevant.

A practical approach: start with your natural prompting style. If the results are not what you expected, add more detail. If you are consistently writing very long prompts, you might be over-specifying — try being more concise and see if the quality holds.

Tips for Getting Consistent Output

Be Explicit About Output Format

If you care about how the result looks, say so:

Add error codes to the API responses. Use the format: { "error": { "code": "AUTH_EXPIRED", "message": "..." } }

Reference Existing Conventions

Add logging to the payment service. Use the same logger and log format as src/services/orders.ts.

State What Not to Change

Refactor the database queries in src/repositories/users.ts to use parameterized queries. Don't change the function signatures — the service layer depends on them.

Include Verification Steps

Update the date formatting to use ISO 8601. After making the changes, run the test suite to check for regressions.

Use the codex.md File for Persistent Conventions

If you find yourself repeating the same instructions across sessions (coding style, preferred libraries, project-specific patterns), put them in a codex.md file at the project root. This file is automatically read at the start of each session, so conventions you document there do not need to be restated in every prompt. See Module 02 for details on project memory.

Summary

The key Codex CLI-specific techniques are:

  1. Match your prompting precision to your approval mode — exploratory in suggest mode, precise in full-auto mode
  2. Understand sandbox constraints and work within them — pre-install dependencies, avoid network-dependent tasks
  3. Use multi-turn conversations to build context and iterate, but keep sessions topically focused
  4. Reference files and existing patterns to eliminate ambiguity and leverage your codebase as a specification
  5. Adjust for model capabilities — more concise for stronger models, more explicit for smaller ones
  6. Use codex.md for persistent conventions so you do not repeat yourself across sessions

Combined with the general principles in the concepts module, these techniques will help you get reliable, high-quality results from Codex CLI across different tasks and approval modes.


title: "Exercises — Prompting for Agents" last_updated: 2026-03-21 status: proven difficulty: beginner prerequisites: [02-project-memory]

Exercises — Prompting for Agents

These exercises build practical prompting skills through hands-on experimentation. Each one is designed to reveal something specific about how agents interpret and respond to different kinds of instructions.

You will need a project to work with. If you do not have one available, create a small Express or Fastify API with a few endpoints, or use any existing project you are comfortable experimenting with.


Exercise 1: The Decomposition Challenge

Objective

Learn to break complex tasks into well-scoped agent subtasks and develop an intuition for the right task size.

Steps

  1. Start with this complex task: "Add user authentication to this app." (If your project already has auth, substitute another large feature: "Add a notification system," "Add role-based access control," or "Add full-text search.")

  2. Decompose the task into 5-7 subtasks. For each subtask, write a single-sentence prompt that you would give to a coding agent. Aim for prompts that are self-contained — each one should make sense on its own, even though they form a sequence.

  3. Review your list. For each subtask, evaluate:

    • Is it too big? Could the agent drift from your intent before you get to review? Does it require too many sequential decisions? A task is too big if you cannot review the full output in under five minutes.
    • Is it too small? Would it be faster to just do it yourself? Does the overhead of prompting and reviewing exceed the work itself? A task is too small if it takes longer to describe than to do.
    • Is it well-scoped? Does it have a clear, verifiable outcome? Could you tell whether the agent succeeded just by looking at the result?
  4. Rewrite any subtasks that are too big or too small. Aim for the sweet spot described in the concepts module.

  5. (Optional) Actually run your prompts sequentially. After each one, review the result before sending the next. Note where you needed to give corrections and whether those corrections suggest the task was mis-scoped.

Expected Outcome

You should have a list of 5-7 subtasks, each with a clean one-sentence prompt and a clear deliverable. At least one of your original subtasks should have been too big (requiring further decomposition) and at least one should have been too small (better merged with another subtask or done manually).

Hints

  • A good subtask often maps to a single file, module, or concept. "Create the user model and migration" is a natural unit. "Create the user model, migration, and all API endpoints" is probably too large.
  • Think about dependency order. Some subtasks must complete before others can start. Your sequence should reflect this.
  • If a subtask prompt requires more than two sentences to be clear, the task may be too big.

Exercise 2: Vague vs. Precise

Objective

Develop a feel for the intent-vs-instruction spectrum by writing three versions of the same prompt and observing how the agent responds to each.

Steps

  1. Choose a concrete task in your project. Good candidates:

    • Improving error handling in a specific file
    • Adding input validation to a form or API endpoint
    • Refactoring a function that is too long or complex
    • Adding tests to an untested module
  2. Write three prompts for this task:

    Prompt A — Too Vague: Write a prompt that is genuinely too vague. Omit the file name, omit specifics about what "better" means, and use hand-wavy language. Example: "Make the error handling better."

    Prompt B — Too Prescriptive: Write a prompt that dictates exact implementation details. Specify line numbers, variable names, exact code to write. Leave the agent no room to use its own judgment. Example: "On line 23 of src/api/users.ts, wrap the existing code in a try-catch. In the catch block, create a variable called errMsg set to error.message, then call res.status(500).json({ error: errMsg })."

    Prompt C — Balanced: Write a prompt that specifies intent, points to the relevant file, states the desired outcome, and includes any constraints — but lets the agent choose the implementation. Example: "The error handling in src/api/users.ts returns generic 500 errors for all failures. Refactor it to return appropriate HTTP status codes (400 for validation errors, 404 for not found, 500 for unexpected errors) with descriptive error messages. Keep the response format consistent with the other API endpoints."

  3. Run all three prompts in separate sessions (use /clear or start new sessions between them). Do not modify or guide the agent after the initial prompt — let each prompt stand on its own.

  4. Compare the three results:

    • Which prompt produced the best code?
    • Which one required the least follow-up to be production-ready?
    • Where did the vague prompt go wrong? Was the result bad, or just not what you wanted?
    • Where did the prescriptive prompt go wrong? Did the agent follow your instructions even when they were suboptimal?
  5. Document your findings. Write a paragraph about what made the balanced prompt work better than the other two.

Expected Outcome

The vague prompt should produce something that technically works but misses your actual intent — it will "improve" something, but not necessarily the dimension you cared about. The prescriptive prompt should produce exactly what you described, which may be worse than what the agent would have chosen on its own. The balanced prompt should produce the best result because it combines your domain knowledge (what needs to change and why) with the agent's implementation skill (how to change it).

Hints

  • For the vague prompt, resist the urge to be helpful. The goal is to see what happens when the agent lacks guidance.
  • For the prescriptive prompt, try to dictate code from memory without looking at the file. Notice how the agent handles instructions that do not quite match the actual code.
  • The balanced prompt should be shorter than the prescriptive one. If it is longer, you are probably over-specifying.

Exercise 3: The Correction Loop

Objective

Practice giving effective iterative feedback and experience how corrections compound within a session.

Steps

  1. Choose a moderately complex task. Good candidates:

    • "Add a search feature to the users API endpoint with filtering by name and email"
    • "Create a logging middleware that logs request method, path, status code, and response time"
    • "Add pagination to the list endpoint with page and limit query parameters"
  2. Send your initial prompt. Use the balanced prompting style from Exercise 2 — clear intent, specific file references, stated outcome. Do NOT try to anticipate every edge case.

  3. Review the first result. Find something that is not quite right. It might be:

    • A missing edge case
    • A suboptimal implementation choice
    • A style inconsistency with the rest of the codebase
    • Missing error handling
    • Missing tests
  4. Round 1 correction: Give specific feedback. Use this format: "This is close, but [specific observation]. [Specific change you want]."

    Example: "This is close, but the search is case-sensitive. Make the name and email filters case-insensitive using a lower-case comparison."

  5. Review the updated result. Find another issue.

  6. Round 2 correction: Build on the previous context. You do not need to re-explain the task.

    Example: "Good. Now the search query parameter should also support partial matches — if I search for 'john', it should match 'Johnson' and 'johnny'."

  7. Review again and do one more round.

  8. Round 3 correction: This round should address a more nuanced concern — something the agent could not have known from the original prompt.

    Example: "The search works well. One more thing: add a maximum limit of 100 results per page to prevent clients from requesting the entire dataset. Return a 400 error if the client requests more than 100."

  9. After three rounds, review the final result. Compare it to what the initial prompt produced.

Expected Outcome

The final result after three rounds of feedback should be substantially better than the initial result. Each correction should have been short (one to two sentences) and specific. The agent should have preserved its previous good work while incorporating each correction. By round three, the agent should have a rich understanding of your preferences and intent that no single prompt could have conveyed.

Hints

  • Resist the urge to list all corrections at once. The point of this exercise is to practice the iterative loop, where each round builds on the last.
  • If the agent does not preserve previous corrections when applying new ones, be explicit: "Keep the case-insensitive matching from the previous change."
  • Notice how your corrections get more nuanced with each round. The first round catches obvious issues. The third round catches subtleties. This is the natural rhythm of the feedback loop.
  • If the agent nails it on the first try, choose a harder task. The exercise works best when the initial result is good but imperfect.

Exercise 4: Plan Mode Practice

Objective

Experience the difference between planned and unplanned execution, and learn when planning ahead improves outcomes.

Steps

  1. Choose a complex task that touches multiple files. Good candidates:

    • "Add request validation using a schema library (like Zod or Joi) to all API endpoints"
    • "Refactor the data access layer to use the repository pattern"
    • "Add comprehensive error handling with custom error classes and a centralized error handler"
  2. Phase A — Plan first. Use plan mode (in Claude Code) or suggest mode (in Codex CLI).

    Claude Code: Toggle to plan mode with Shift+Tab, or include "plan" in your prompt:

    Plan how to add request validation using Zod to all the API endpoints in src/api/. Don't make changes yet.
    

    Codex CLI: Use suggest mode so the agent proposes without executing:

    Suggest how to add request validation using Zod to all the API endpoints in src/api/.
    
  3. Review the plan. Look for:

    • Does the agent's approach make sense?
    • Is the order of operations logical?
    • Are there steps you disagree with?
    • Did the agent miss something important?
  4. Modify the plan if needed:

    Good plan, but two changes: (1) Start with the users endpoint as a prototype before doing the others.
    (2) Put the Zod schemas in a separate schemas/ directory, not inline in the route files.
    
  5. Execute the (modified) plan:

    Go ahead and implement this plan.
    
  6. Review the result from Phase A. Note the quality and how well it matched your expectations.

  7. Phase B — Direct execution. Start a fresh session (/clear or new terminal). Give the same task as a direct prompt without plan mode:

    Add request validation using Zod to all the API endpoints in src/api/. Put Zod schemas in a separate schemas/ directory.
    
  8. Compare the results from Phase A and Phase B:

    • Which produced better code?
    • Which was faster end-to-end (including the time you spent reviewing the plan)?
    • Did the plan catch anything that direct execution got wrong?
    • Did direct execution produce anything surprising — good or bad?

Expected Outcome

For a complex, multi-file task, the planned approach should produce a more coherent result because you had a chance to shape the strategy before execution. The direct approach might be faster for simple tasks but could produce inconsistencies across files for complex ones — for example, using slightly different patterns in different files because the agent made ad-hoc decisions at each step.

You should come away with a feel for the complexity threshold where planning pays off. Simple tasks (one file, clear outcome) do not need a plan. Complex tasks (multiple files, architectural decisions, new patterns) benefit from one.

Hints

  • If using Codex CLI in suggest mode, remember that you need to approve or direct the agent to actually make changes after reviewing the suggestions.
  • When modifying the plan, be specific about what to change. "Make it better" is as unhelpful in plan feedback as it is in task prompts.
  • Phase B should use the same constraints you added during plan review (like the schemas/ directory). The comparison should be between planned vs. unplanned execution, not between different requirements.
  • If both approaches produce identical results, your task might not be complex enough. Try something that requires more cross-file coordination.
  • Time both approaches. The overhead of planning is worth it only when it prevents rework. For a five-minute task, spending three minutes planning is not efficient. For a thirty-minute task, a five-minute plan that prevents a fifteen-minute redo is very efficient.

What to Take Away

After completing these exercises, you should have practical experience with:

  • Decomposition: Breaking large tasks into well-scoped subtasks with clear deliverables
  • Prompt calibration: Finding the sweet spot between too vague and too prescriptive
  • Iterative feedback: Using targeted corrections to refine results across multiple turns
  • Strategic planning: Knowing when to plan ahead and when to execute directly

These are not abstract skills — they are the daily mechanics of working effectively with coding agents. The more you practice them, the more intuitive they become.


title: "Hooks and Commands" last_updated: 2026-03-21 status: proven difficulty: intermediate prerequisites: [03-prompting-for-agents]

Hooks and Commands

Customizing agent behavior with event-driven workflows.

The Core Question: "How Do I Customize Agent Behavior?"

By now you have a working agent, a project memory file that teaches it your conventions, and prompting skills that get good results. But you have probably noticed a recurring frustration: you keep correcting the same things. The agent forgets to run the linter. It commits to the wrong branch. It edits a file you told it not to touch — three sessions ago, in a conversation that no longer exists.

Project memory helps. Writing "always run eslint after editing TypeScript files" in your CLAUDE.md or AGENTS.md reduces the frequency of these mistakes. But it does not eliminate them. Instructions in natural language are suggestions. The agent reads them, tries to follow them, and sometimes does not. There is no enforcement mechanism. No guarantee.

Hooks change that. A hook is a piece of code — usually a shell command or a short script — that runs automatically when the agent does something specific. It is not a suggestion. It is not a prompt. It is code that executes. If you want the linter to run after every file edit, a hook makes that happen every single time, regardless of what the agent remembers or forgets.

This is the difference between asking someone to wash their hands and installing a sensor that dispenses soap automatically. Both aim for the same outcome. One relies on memory and compliance. The other relies on a mechanism.

The Event-Driven Model

Agentic coding tools are built on a loop: the agent thinks, chooses an action, executes it, observes the result, and repeats. Each step in this loop is an event. The agent is about to call a tool — that is an event. The agent just finished calling a tool — that is an event. The agent is about to end its session — that is an event.

Hooks let you attach behavior to these events. You are not modifying the agent's reasoning or changing its model. You are inserting your own code at well-defined points in the agent's workflow. The agent decides to edit a file. Before that edit is applied, your hook runs. It might validate the change, log it, or block it entirely. After the edit is applied, another hook could run the formatter, execute tests, or send a notification.

This is the same pattern that powers Git hooks, CI/CD pipelines, and browser event listeners. If you have ever written a pre-commit hook or a GitHub Actions workflow, you already understand the model. The only difference is that the events come from an AI agent instead of a version control system or a browser.

The event model means hooks are composable. You can have one hook that logs all bash commands, another that blocks commits to main, and a third that runs tests after code changes. They operate independently, each responding to the events it cares about and ignoring the rest.

Why Hooks Matter: The Automation Progression

There is a natural progression in how developers address recurring problems with their agent:

Level 1: Manual correction. The agent does something wrong. You notice it during review. You tell the agent to fix it. This works, but it costs you time and attention every time it happens. It scales poorly — you catch the same issue across dozens of sessions.

Level 2: Project memory instruction. You add a line to your CLAUDE.md or AGENTS.md: "Always run tests after modifying code in the src/ directory." The agent reads this at the start of each session and usually follows it. Compliance is high but not perfect. The agent might skip it when the context window is full or when it is focused on a complex chain of reasoning.

Level 3: Hook. You write a PostToolUse hook that runs npm test whenever a file in src/ is modified. The tests run every time. The agent cannot forget, skip, or rationalize its way out of it. The behavior is guaranteed by code, not by prompt compliance.

Each level is appropriate for different situations. Manual correction is fine for one-off issues. Project memory is right for conventions that benefit from flexibility — the agent should usually follow them but might have good reasons to deviate. Hooks are right for non-negotiable requirements — things that must happen every time, with no exceptions.

The mistake most people make is staying at Level 1 for too long. If you find yourself correcting the same behavior more than three times, it is time to escalate to Level 2 or Level 3.

Types of Hooks

Hooks fall into four broad categories based on when they run and what they do:

Pre-action hooks run before the agent performs an action. They can inspect what the agent is about to do and optionally block it. A pre-action hook might prevent the agent from executing a dangerous bash command, block edits to protected files, or validate that certain conditions are met before a destructive operation proceeds.

Post-action hooks run after the agent has completed an action. They cannot block the action (it has already happened), but they can respond to it. A post-action hook might run a linter after a file edit, execute tests after code changes, or format code that the agent just wrote.

Notification hooks run when the agent reaches certain milestones — completing a task, encountering an error, or needing human input. They are useful for monitoring long-running tasks or integrating with external systems like Slack or email.

Validation hooks are a specialized form of pre-action hooks focused on checking conditions. They verify that the environment is in the right state before the agent proceeds — the correct branch is checked out, required environment variables are set, or certain files exist.

In practice, the boundaries between these categories blur. A single hook might validate, log, and conditionally block. The categories are useful for thinking about what you need, not for strict classification.

The Principle: Hooks Enforce What Project Memory Requests

The best way to think about the relationship between hooks and project memory is as two layers of the same system. Project memory describes your intent. Hooks enforce it.

Your CLAUDE.md says "run tests after modifying source files." That is a statement of policy. Your PostToolUse hook that executes the test suite after edits to src/ is the enforcement mechanism. The project memory instruction tells the agent why this matters (so it can make intelligent decisions when edge cases arise). The hook ensures it actually happens (so you do not need to trust the agent's compliance).

This layered approach is more robust than either mechanism alone. Project memory without hooks is aspirational. Hooks without project memory are opaque — the agent does not understand why things are happening around it, which makes it harder for it to work with the system rather than against it.

Custom Commands: Shortcuts for Complex Workflows

Beyond hooks, most agentic tools support custom commands — named shortcuts that trigger specific workflows. If you find yourself typing the same complex prompt repeatedly ("run the full test suite, report any failures, and if tests pass then run the linter"), a custom command turns that into a single invocation.

Custom commands differ from hooks in a key way: hooks are automatic (triggered by events), while commands are manual (triggered by you). Hooks are for things that should always happen. Commands are for things you want to happen on demand, with a convenient shortcut.

Think of hooks as cron jobs and commands as shell aliases. Both reduce repetition. They just operate at different levels of automation.

When to Use Hooks vs. Project Memory Instructions

Not everything should be a hook. Here is a practical decision framework:

Use a project memory instruction when:

  • The behavior benefits from flexibility. The agent should usually follow it but might have good reasons to deviate.
  • The requirement is about style, approach, or preference rather than correctness.
  • Enforcement is not critical — if the agent skips it once, nothing breaks.

Use a hook when:

  • The behavior must happen every time, with no exceptions.
  • The requirement can be verified programmatically (lint passes, tests pass, file exists).
  • You have corrected the agent about this more than three times and the pattern keeps recurring.
  • The behavior involves external tools (formatters, linters, notifiers) that the agent would need to remember to invoke.

Use both when:

  • You want the agent to understand why the requirement exists (project memory) and you want to guarantee it is met (hook).

The Cost of Over-Hooking

Hooks are powerful, and that power invites overuse. Every hook adds latency. A PostToolUse hook that runs a full test suite after every file edit means the agent waits for tests to complete before it can continue. If your test suite takes 30 seconds, and the agent edits 10 files in a task, you have added 5 minutes of waiting.

More subtly, too many hooks create confusion. If the agent is receiving output from five different hooks after every action, its context window fills with hook output instead of task-relevant information. The signal-to-noise ratio drops. The agent may start responding to hook output instead of focusing on the actual task.

There is also a maintenance cost. Hooks are code. Code has bugs. A hook that worked fine with your old project structure may break after a refactor. A hook that checks for a specific file might fail when that file is renamed. Hooks that call external services can time out. Every hook you add is a piece of infrastructure you need to maintain.

The practical guidance is: start with zero hooks. Add them one at a time, in response to specific problems you have observed. After each addition, use the agent for a few sessions and evaluate whether the hook is helping or hindering. Remove hooks that cause more friction than they prevent.

Security Considerations

Hooks run with your permissions. A hook attached to a PreToolUse event executes as your user, with access to your filesystem, your environment variables, your network, and your credentials. This is by design — hooks need access to your tools to be useful — but it means you should treat hook configuration with the same care you treat shell scripts.

Do not copy-paste hook configurations from untrusted sources without reading them. A malicious hook could exfiltrate environment variables, modify files outside your project, or install software. This is no different from the risk of running an untrusted shell script, but hooks have the additional property of running automatically and silently.

Keep hooks in version-controlled project configuration where your team can review them. Avoid hooks that download and execute remote code. If a hook needs elevated permissions, think carefully about whether it really needs to be a hook or whether it would be safer as a manual step.

Key Takeaways

  • Hooks turn recurring corrections into permanent automation. If you keep fixing the same agent behavior, a hook fixes it once and for all.
  • The automation progression is: manual correction, project memory instruction, hook. Escalate when the frequency of a problem justifies the investment.
  • Hooks are event-driven. They attach to specific points in the agent's workflow — before an action, after an action, on notification, or on session boundaries.
  • Hooks enforce what project memory requests. Use both layers together for the most robust configuration.
  • Custom commands are manual shortcuts. Use them for frequently repeated workflows that you want on demand rather than automatic.
  • Do not over-hook. Each hook adds latency, complexity, and maintenance burden. Add them one at a time, in response to observed problems.
  • Hooks run with your permissions. Treat hook configuration as seriously as you treat shell scripts. Review before you run.

title: "Hooks and Commands — Claude Code" tested_with: claude-code: "1.0.x" last_updated: 2026-03-21 status: proven difficulty: intermediate prerequisites: [03-prompting-for-agents]

Hooks and Commands in Claude Code

Claude Code's hooks system lets you attach shell commands to specific events in the agent's workflow. Hooks are configured in JSON, run as subprocesses, and their output is fed back to the agent as context.

Where Hooks Are Configured

Hooks live in your Claude Code settings file. There are two levels:

  • Project-level: .claude/settings.json in your project root. These hooks apply to everyone working on the project (once committed to version control).
  • User-level: ~/.claude/settings.json in your home directory. These hooks apply to all your projects.

Project-level hooks take precedence. If you define a hook for the same event at both levels, both run — project-level hooks first, then user-level hooks.

Hook Events

Claude Code exposes five hook events:

EventWhen It FiresCommon Uses
PreToolUseBefore the agent executes a tool callValidation, blocking, logging
PostToolUseAfter the agent executes a tool callLinting, formatting, testing
NotificationWhen the agent wants to notify the userExternal alerts, logging
StopWhen the agent finishes its responseSummary actions, cleanup
SubagentStopWhen a sub-agent finishesCoordination between agents

Each event provides environment variables with context about what triggered it. The most important ones:

  • $TOOL_NAME — the name of the tool being called (e.g., Bash, Edit, Write)
  • $TOOL_INPUT — the input passed to the tool (JSON-encoded)
  • $TOOL_OUTPUT — the output from the tool (only available in PostToolUse)
  • $SESSION_ID — the current session identifier

Hook Structure

A hook configuration has two parts: a matcher that determines which tool calls trigger the hook, and a hook command that runs when the matcher matches.

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hook": "echo 'Running bash command: $TOOL_INPUT'"
      }
    ]
  }
}

The matcher field filters by tool name. Common tool names include Bash, Edit, Write, Read, and Glob. If you omit the matcher, the hook runs for every tool call on that event.

You can also match on content patterns within the tool input. This lets you create hooks that only trigger for specific types of operations — for example, only bash commands that contain git commit.

Hook Behavior

When a hook runs:

  1. The command executes as a subprocess with your user permissions.
  2. Standard output from the hook is captured and fed back to the agent as additional context.
  3. For PreToolUse hooks, a non-zero exit code blocks the tool call. The agent sees the hook's output and can decide how to proceed.
  4. For PostToolUse hooks, exit codes do not block anything (the action already happened), but the output still reaches the agent.
  5. Hooks have a default timeout. Long-running hooks will be killed.

This means your hooks can communicate with the agent. If a linter hook prints error output, the agent sees those errors and can fix them. If a validation hook prints "BLOCKED: cannot commit to main branch," the agent reads that message and adjusts its approach.

10 Production-Ready Hooks

1. Lint on File Save

Run your linter every time the agent edits a file. The agent sees any lint errors and can fix them immediately.

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit",
        "hook": "cd $PROJECT_DIR && npx eslint --no-error-on-unmatched-pattern $(echo $TOOL_INPUT | jq -r '.file_path') 2>&1 || true"
      }
    ]
  }
}

Why it works: The linter output goes directly to the agent. If there are errors, the agent sees them in its next reasoning step and can issue a follow-up edit to fix them. The || true ensures the hook itself does not block — the lint output is informational.

2. Run Tests After Code Changes

Automatically run your test suite after the agent modifies source files.

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit",
        "hook": "cd $PROJECT_DIR && if echo $TOOL_INPUT | jq -r '.file_path' | grep -q 'src/'; then npm test 2>&1 | tail -20; fi"
      }
    ]
  }
}

Why it works: The grep -q 'src/' ensures tests only run when source files change, not when the agent edits configuration or documentation. The tail -20 keeps the output concise so it does not overwhelm the agent's context.

3. Prevent Commits to Main Branch

Block any attempt to commit directly to the main or master branch.

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hook": "if echo $TOOL_INPUT | grep -q 'git commit'; then BRANCH=$(git -C $PROJECT_DIR rev-parse --abbrev-ref HEAD); if [ \"$BRANCH\" = 'main' ] || [ \"$BRANCH\" = 'master' ]; then echo 'BLOCKED: Cannot commit directly to $BRANCH. Create a feature branch first.' && exit 1; fi; fi"
      }
    ]
  }
}

Why it works: This is a PreToolUse hook with a non-zero exit code, which means it actually prevents the tool call from executing. The agent sees the "BLOCKED" message and knows to create a branch first.

4. Log All Bash Commands for Audit

Write every bash command the agent runs to a log file with timestamps.

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hook": "echo \"$(date -Iseconds) | $SESSION_ID | $TOOL_INPUT\" >> $PROJECT_DIR/.claude/agent-commands.log"
      }
    ]
  }
}

Why it works: You get a complete, timestamped record of every command the agent executed. Useful for auditing, debugging, and understanding what the agent did during long sessions. Add .claude/agent-commands.log to your .gitignore.

5. Auto-Format Code After Edits

Run your code formatter after every file edit so the agent's output always matches your project's style.

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit",
        "hook": "cd $PROJECT_DIR && FILE=$(echo $TOOL_INPUT | jq -r '.file_path') && if echo $FILE | grep -qE '\\.(ts|tsx|js|jsx)$'; then npx prettier --write $FILE 2>&1; fi"
      }
    ]
  }
}

Why it works: Instead of telling the agent to format its code (which it may forget), the formatter runs automatically. The agent sees the formatter's output and learns the actual style, which improves future edits in the same session.

6. Notify on Task Completion

Send a desktop notification when the agent finishes a task, so you can work on other things while it runs.

{
  "hooks": {
    "Stop": [
      {
        "hook": "notify-send 'Claude Code' 'Task completed' 2>/dev/null || osascript -e 'display notification \"Task completed\" with title \"Claude Code\"' 2>/dev/null || true"
      }
    ]
  }
}

Why it works: The hook tries Linux (notify-send) and macOS (osascript) notification methods. The 2>/dev/null || true fallbacks ensure it does not fail on platforms where one method is unavailable.

7. Validate Environment Before Destructive Operations

Check that critical environment variables or files exist before allowing potentially destructive commands.

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hook": "if echo $TOOL_INPUT | grep -qE '(rm -rf|drop table|truncate|delete from)'; then if [ ! -f $PROJECT_DIR/.claude/destructive-ops-allowed ]; then echo 'BLOCKED: Destructive operation detected. Create .claude/destructive-ops-allowed to enable.' && exit 1; fi; fi"
      }
    ]
  }
}

Why it works: This adds a manual gate for dangerous operations. The agent cannot accidentally run rm -rf or database destructive commands unless you have explicitly opted in by creating a marker file.

8. Block Certain File Patterns from Editing

Prevent the agent from modifying files that should be hand-maintained, like migration files or generated code.

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Edit",
        "hook": "FILE=$(echo $TOOL_INPUT | jq -r '.file_path') && if echo $FILE | grep -qE '(migrations/|generated/|\\.lock$)'; then echo \"BLOCKED: $FILE is in a protected path. These files should not be edited by the agent.\" && exit 1; fi"
      }
    ]
  }
}

Why it works: Some files should never be touched by an automated tool — database migrations that have already been applied, lock files managed by package managers, generated code that will be overwritten. This hook makes those boundaries hard rather than advisory.

9. Add Timestamps to Generated Comments

Append a timestamp to any code comments the agent generates, so you can track when agent-generated code was written.

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Write",
        "hook": "FILE=$(echo $TOOL_INPUT | jq -r '.file_path') && if [ -f \"$FILE\" ]; then sed -i \"s|// TODO|// TODO (agent $(date +%Y-%m-%d))|g\" $FILE 2>/dev/null; fi"
      }
    ]
  }
}

Why it works: This gives you a paper trail. When you encounter a TODO six months from now, the date tells you when it was created and that it came from an agent session. Adjust the pattern to match your project's comment convention.

10. Custom Commit Message Formatting

Ensure all commits made by the agent follow your team's commit message convention.

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hook": "if echo $TOOL_INPUT | grep -q 'git commit'; then if ! echo $TOOL_INPUT | grep -qE 'git commit -m \"(feat|fix|docs|refactor|test|chore):'; then echo 'BLOCKED: Commit message must follow conventional commits format (feat|fix|docs|refactor|test|chore): description' && exit 1; fi; fi"
      }
    ]
  }
}

Why it works: Rather than hoping the agent remembers your commit message format from the CLAUDE.md, this hook enforces it structurally. The agent sees the required format in the error message and retries with a properly formatted commit.

Custom Slash Commands

Claude Code supports custom slash commands — shortcuts you invoke with / followed by a command name. These are defined as Markdown files in your project's .claude/commands/ directory.

Creating a Command

Create a file at .claude/commands/test.md:

Run the full test suite for this project. Report:
1. Total tests run
2. Tests passed
3. Tests failed (with details for each failure)
4. Test coverage percentage if available

Use the project's standard test runner. Do not modify any test files.

Now you can type /test in Claude Code, and it will execute this workflow.

Command File Structure

Each command file is a Markdown document that serves as a prompt template. The filename (without .md) becomes the command name. You can organize commands in subdirectories:

.claude/
  commands/
    test.md           -> /test
    review.md         -> /review
    deploy-check.md   -> /deploy-check
    db/
      migrate.md      -> /db:migrate
      seed.md         -> /db:seed

Variable Substitution

Commands support the $ARGUMENTS placeholder, which is replaced with whatever the user types after the command name. For example, if review.md contains:

Review the following file for potential bugs, security issues, and style violations: $ARGUMENTS

Then typing /review src/auth.ts passes src/auth.ts into the prompt.

Team-Shared Commands

Because commands live in .claude/commands/, they are version-controlled with your project. Any team member who clones the repository gets the same set of commands. This is a good way to standardize workflows across a team without requiring everyone to configure their individual settings.

Good candidates for team-shared commands:

  • Running the test suite and interpreting results
  • Performing a pre-merge review checklist
  • Generating boilerplate for new components, endpoints, or modules
  • Running database operations in the correct sequence
  • Checking for common security issues

Debugging Hooks That Are Not Working

Hooks fail silently by default. If a hook is not doing what you expect, work through this checklist:

1. Verify the settings file location. Run claude config list or check that your .claude/settings.json is in the project root (not a subdirectory). User-level settings go in ~/.claude/settings.json.

2. Check JSON syntax. A single misplaced comma or missing bracket will cause the entire settings file to be ignored. Run your settings file through jq . to validate:

jq . .claude/settings.json

3. Test the hook command manually. Copy the hook command and run it in your terminal with the environment variables set manually. This isolates whether the problem is in the hook logic or in the hook system:

export TOOL_INPUT='{"command":"git status"}'
export PROJECT_DIR=$(pwd)
# paste your hook command here and run it

4. Check the matcher. Tool names are case-sensitive. bash will not match — it must be Bash. Common tool names: Bash, Edit, Write, Read, Glob, Grep.

5. Check for timeout issues. Hooks that take too long will be killed. If your hook runs a slow command (a full test suite, a remote API call), it may be hitting the timeout. Test the command's execution time independently.

6. Look for permission errors. If the hook calls a tool that requires specific permissions or environment variables, make sure those are available in the hook's execution context. Hooks inherit your user environment but may not have the same shell initialization (.bashrc, .zshrc may not be sourced).

7. Use echo statements for tracing. Add echo "HOOK FIRED: ..." to the beginning of your hook command. If you see the output in the agent's context, the hook is firing. If not, the matcher is not matching.

8. Check for conflicting hooks. If you have hooks at both the project and user level for the same event, they both run. This can cause unexpected behavior if they interact — for example, two hooks both trying to format the same file.

Putting It Together

A well-configured Claude Code project typically has:

  • A CLAUDE.md that describes conventions and intent.
  • A .claude/settings.json with 2-5 hooks that enforce the most critical of those conventions.
  • A .claude/commands/ directory with 3-10 custom commands for common workflows.

The CLAUDE.md tells the agent what to do. The hooks ensure it happens. The commands give you efficient shortcuts. Together, they create a development environment where the agent is not just capable but consistently aligned with your project's standards.

Start minimal. Add hooks and commands as you discover pain points. Review your configuration monthly and remove anything that is not pulling its weight.


title: "Hooks and Commands — Codex CLI" tested_with: codex-cli: "0.2.x" last_updated: 2026-03-21 status: proven difficulty: intermediate prerequisites: [03-prompting-for-agents]

Hooks and Commands in Codex CLI

Codex CLI takes a different approach to customization than Claude Code. Where Claude Code provides a rich hooks system for attaching behavior to tool events, Codex CLI relies on its sandbox model, approval modes, and instruction configuration to shape agent behavior. Understanding these differences helps you make the right customization choices for each tool.

Configuration in Codex CLI

Codex CLI is configured through multiple layers:

  • codex.yaml — A project-level configuration file in your repository root. This defines default model, approval mode, and project-specific settings.
  • AGENTS.md — The project memory file, equivalent to Claude Code's CLAUDE.md. This is where you write behavioral instructions for the agent.
  • ~/.codex/config.yaml — User-level configuration that applies across all projects.
  • Command-line flags — Override any configuration for a single session.

A typical codex.yaml looks like this:

model: o4-mini
approval_mode: suggest
project_doc: AGENTS.md

The approval_mode setting is the most important customization lever in Codex CLI. It fundamentally changes how the agent interacts with your system.

Approval Modes as Behavioral Control

Codex CLI offers three approval modes, and choosing the right one is the primary way you customize agent behavior:

suggest mode: The agent can read files and propose changes, but every write operation and command execution requires your explicit approval. This is the most conservative mode. Use it when you are exploring a new codebase, working on sensitive code, or when you want to review every action.

auto-edit mode: The agent can read and write files without approval, but command execution (bash commands, running tests) still requires your confirmation. This is a good middle ground — the agent can edit code freely but cannot run anything that might have side effects.

full-auto mode: The agent can read, write, and execute commands without approval. Operations are sandboxed (more on this below), but within the sandbox the agent operates autonomously. Use this for well-understood tasks where you trust the agent and want maximum speed.

These modes give you coarse-grained but effective control over agent behavior. Where Claude Code uses hooks to selectively allow or block specific operations, Codex CLI uses approval modes to set the overall autonomy level and the sandbox to enforce boundaries.

The Sandbox Model

Codex CLI's most distinctive feature is its sandbox. In full-auto mode, commands run inside a sandboxed environment that restricts:

  • Network access — The agent cannot make outbound network requests by default. This prevents accidental data exfiltration, unintended API calls, and downloading of unknown code.
  • Filesystem access — The agent can only access the project directory and specific allowed paths. It cannot read or modify files outside the project.
  • Process isolation — Commands run in a contained environment that limits their blast radius.

The sandbox serves the same purpose as many Claude Code hooks — preventing dangerous operations — but it does so at the infrastructure level rather than the event-driven level. You do not need a hook to prevent rm -rf / because the sandbox will not allow it.

This is an important architectural difference. Claude Code hooks are opt-in protection: you add hooks to block specific behaviors. Codex CLI's sandbox is opt-out restriction: everything is blocked by default, and you allow specific behaviors. The sandbox model is more secure by default but less flexible for fine-grained customization.

Configuring Sandbox Permissions

You can expand the sandbox's permissions when your workflow requires it:

sandbox:
  allow_network:
    - "registry.npmjs.org"
    - "api.github.com"
  allow_paths:
    - "/tmp"
    - "~/.npm"

Be deliberate about what you allow. Each permission you add weakens the sandbox. If you find yourself allowing everything, you are better off using auto-edit mode with manual command approval instead.

Behavioral Configuration Through AGENTS.md

Since Codex CLI does not have an event-driven hooks system, your primary tool for shaping agent behavior is the AGENTS.md file. This makes the quality of your project memory instructions even more important than it is with Claude Code.

Where a Claude Code user might write a brief instruction in CLAUDE.md and back it up with a hook, a Codex CLI user needs the instruction itself to be thorough enough that the agent follows it reliably.

Effective patterns for AGENTS.md behavioral control:

## Mandatory Workflow

Before modifying any file in `src/`, ALWAYS:
1. Run `npm test` to establish a baseline
2. Make your changes
3. Run `npm test` again to verify nothing broke
4. Run `npx eslint src/` to check for lint errors

If any step fails, fix the issue before proceeding.

## Forbidden Operations

NEVER:
- Commit directly to the `main` or `master` branch
- Modify files in `migrations/` — these are immutable after creation
- Delete test files
- Run `npm publish` or any deployment command

## Required Patterns

All new functions must include:
- JSDoc comments with @param and @returns
- At least one unit test in the corresponding .test.ts file
- Error handling for invalid inputs

These instructions work well in Codex CLI because the agent's instruction-following is strong and the sandbox prevents the worst outcomes even if the agent deviates. The combination of clear instructions plus sandbox constraints gives you reasonable control without hooks.

Practical Customization Examples

Example 1: Enforcing Test-Driven Workflow

In Claude Code, you might use a PostToolUse hook to run tests after every edit. In Codex CLI, you configure this through instructions and approval mode:

AGENTS.md:

## Test-Driven Workflow

For every code change:
1. Write or identify the relevant test FIRST
2. Run the test to see it fail (or confirm existing tests pass)
3. Make the code change
4. Run the test suite to verify
5. Do not consider the task complete until all tests pass

The test command is: `npm test`

codex.yaml:

approval_mode: auto-edit

With auto-edit mode, the agent can write code freely but must get your approval before running npm test. This gives you a natural checkpoint to verify the agent is following the test-driven workflow.

Example 2: Safe Database Operations

## Database Safety

This project uses PostgreSQL. The following rules are absolute:
- NEVER run DROP, TRUNCATE, or DELETE without WHERE on any production table
- Always use transactions for multi-step database operations
- New migrations go in `db/migrations/` with the naming format: `YYYYMMDD_HHMMSS_description.sql`
- Test all SQL against the development database before suggesting changes to production configs

Combined with sandbox network restrictions, the agent cannot accidentally connect to a production database even if it ignores the instructions.

Example 3: Code Review Workflow

## Review Checklist

When asked to review code, follow this checklist:
1. Check for security issues (SQL injection, XSS, auth bypasses)
2. Check for error handling (are errors caught? are they logged?)
3. Check for test coverage (are new code paths tested?)
4. Check for naming conventions (camelCase for variables, PascalCase for types)
5. Check for documentation (public APIs must have JSDoc)
6. Summarize findings as a numbered list with severity (critical/warning/info)

This achieves through instructions what Claude Code might achieve through a custom slash command. The result is the same — a standardized review process — but the mechanism is different.

Where Codex CLI Customization Is More Limited

Honest assessment of the gaps:

No event-driven hooks. You cannot automatically run a command every time the agent edits a file. You rely on instructions and manual approval instead. This means behaviors that Claude Code can guarantee, Codex CLI can only strongly encourage.

No custom slash commands. There is no equivalent to Claude Code's .claude/commands/ directory. You can define workflow prompts in your AGENTS.md, but there is no shortcut invocation mechanism. Workaround: create a section in AGENTS.md called "Workflows" and reference them by name in your prompts — "Follow the Review Checklist workflow."

No post-action automation. After the agent edits a file, you cannot automatically trigger a linter, formatter, or test run. The agent must decide to do it (based on instructions) or you must approve it (based on approval mode). Workaround: use auto-edit mode and make the agent's instructions explicit about running verification commands. Then approve those commands when they come up.

No output interception. You cannot inject messages into the agent's context based on tool output. In Claude Code, a hook can analyze tool output and feed modified information back to the agent. Codex CLI does not have this capability. The agent sees raw tool output only.

Workarounds and Patterns

Despite these limitations, experienced Codex CLI users have developed effective patterns:

The pre-prompt script. Before starting a Codex CLI session, run a shell script that checks prerequisites — correct branch, clean working tree, environment variables set. This is manual but ensures the agent starts from a known-good state.

#!/bin/bash
# pre-session.sh
echo "Checking prerequisites..."
BRANCH=$(git rev-parse --abbrev-ref HEAD)
if [ "$BRANCH" = "main" ]; then
  echo "ERROR: You are on main. Create a feature branch first."
  exit 1
fi
if ! git diff --quiet; then
  echo "WARNING: You have uncommitted changes."
fi
echo "Ready for Codex CLI session."

The verification prompt. End every task with a standard verification request: "Before you finish, run the test suite, run the linter, and confirm all checks pass." Experienced users keep this as a text snippet they paste at the end of every session.

The wrapper script. Create a shell alias or script that launches Codex CLI with your preferred settings:

#!/bin/bash
# codex-work.sh
codex --model o4-mini --approval-mode auto-edit --project-doc AGENTS.md "$@"

The post-session hook. After the Codex CLI session ends, run your verification suite manually:

#!/bin/bash
# post-session.sh
echo "Running post-session checks..."
npm test && npx eslint src/ && echo "All checks passed." || echo "ISSUES FOUND — review before committing."

These patterns are less elegant than Claude Code's integrated hooks, but they are effective and easy to understand. They follow the Unix philosophy of small, composable tools — even if the composition is manual rather than automatic.

Making the Choice

If your primary concern is automated enforcement — things must happen every time without exception — Claude Code's hooks system is better suited.

If your primary concern is security and containment — preventing the agent from doing harm — Codex CLI's sandbox model is stronger by default.

If you use both tools (many teams do), let each play to its strengths. Use Claude Code for complex, multi-step workflows where hooks ensure quality gates are met. Use Codex CLI for exploratory work where the sandbox keeps experiments contained.

Neither tool's customization system is complete. Both are evolving rapidly. The principles matter more than the specific mechanisms: define your requirements clearly, enforce the critical ones structurally, and verify the rest through review.


title: "Exercises — Hooks and Commands" last_updated: 2026-03-21 status: proven difficulty: intermediate prerequisites: [03-prompting-for-agents]

Exercises — Hooks and Commands

These exercises build your practical skills with hooks and custom commands. Each one addresses a real customization need that you will encounter in production use. Work through them in order — each exercise builds on concepts from the previous one.


Exercise 1: Your First Hook

Objective

Add a simple logging hook that prints a message every time the agent runs a bash command. Verify that it works by running a task. Then refine it to only log commands that modify files.

Steps

  1. Open your project's .claude/settings.json file (create it if it does not exist). If you do not have a project handy, create a temporary one with mkdir -p /tmp/hook-practice/.claude && cd /tmp/hook-practice.

  2. Add a PreToolUse hook that targets the Bash tool and prints a message. Start with this configuration:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hook": "echo '[HOOK] Bash command detected: '$(echo $TOOL_INPUT | head -c 200)"
      }
    ]
  }
}
  1. Start a Claude Code session and ask the agent to do something that involves bash commands — for example, "List all files in this directory and show me the disk usage."

  2. Observe: does the [HOOK] message appear in the agent's output? If yes, the hook is firing correctly.

  3. Now refine the hook. Modify it so that it only logs bash commands that are likely to modify files. Change the hook command to check for keywords like rm, mv, cp, mkdir, touch, sed, chmod, or redirect operators (>, >>):

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hook": "if echo $TOOL_INPUT | grep -qE '(rm |mv |cp |mkdir |touch |sed |chmod |> |>>)'; then echo '[HOOK] File-modifying command detected: '$(echo $TOOL_INPUT | head -c 200); fi"
      }
    ]
  }
}
  1. Test again with a task that involves both read-only commands (like ls, cat, git status) and file-modifying commands. Verify that only the modifying commands trigger the log message.

Expected Outcome

  • The basic hook fires on every bash command and you see the [HOOK] prefix in the agent's context.
  • The refined hook only fires on commands that modify files.
  • You understand the feedback loop: hook output goes to the agent, which means the agent "sees" your log messages.

Hints

  • If the hook does not fire, check that the matcher value is Bash with a capital B. Tool names are case-sensitive.
  • If the settings file is ignored entirely, validate your JSON with jq . .claude/settings.json. A syntax error anywhere in the file causes the whole file to be skipped.
  • The head -c 200 truncation is important — without it, very long commands will flood the agent's context.
  • Remember that hook output is visible to the agent. In a real workflow, you would probably log to a file instead of echoing to stdout, unless you want the agent to react to the log message.

Exercise 2: The Quality Gate

Objective

Create a hook that runs your project's linter after every file edit. If the lint fails, the output should inform the agent so it can fix the issues automatically.

Steps

  1. Choose a project that has a linter configured. If you do not have one, set up a minimal project:
mkdir -p /tmp/lint-practice/src && cd /tmp/lint-practice
npm init -y
npm install --save-dev eslint
npx eslint --init  # choose a basic configuration
mkdir -p .claude
  1. Create a source file with an intentional lint error:
// src/example.js
var x = 1
var y = 2
console.log(x)
// y is declared but never used — this should trigger a lint warning
  1. Add a PostToolUse hook to .claude/settings.json that runs the linter after any Edit or Write operation:
{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit",
        "hook": "cd $PROJECT_DIR && npx eslint $(echo $TOOL_INPUT | jq -r '.file_path') 2>&1 || true"
      },
      {
        "matcher": "Write",
        "hook": "cd $PROJECT_DIR && npx eslint $(echo $TOOL_INPUT | jq -r '.file_path') 2>&1 || true"
      }
    ]
  }
}
  1. Start a Claude Code session and ask the agent: "Edit src/example.js to add a new function called add that takes two numbers and returns their sum."

  2. Observe: after the agent edits the file, the linter runs automatically. If there are lint errors (including the pre-existing y unused variable), the agent should see them in its context.

  3. Follow up with: "Fix any lint errors in the file." The agent should use the lint output it already received to identify and fix the issues.

  4. Verify that after the fix, the linter hook runs again and reports no errors.

Expected Outcome

  • The linter runs automatically after every edit — you never have to ask for it.
  • The agent sees lint output and can act on it.
  • After the agent fixes the lint errors, the subsequent lint run is clean.
  • You have a working quality gate that ensures code is always linted.

Hints

  • The || true at the end of the hook command is critical. Without it, lint failures cause the hook to exit with a non-zero code. For PostToolUse hooks this does not block anything, but it can produce confusing error messages.
  • If you use a linter other than ESLint (Ruff for Python, Clippy for Rust, etc.), adjust the command accordingly. The pattern is the same: run the linter on the edited file and let the output flow back to the agent.
  • If jq is not installed, you can use a simpler approach: echo $TOOL_INPUT | python3 -c "import sys,json; print(json.load(sys.stdin)['file_path'])".
  • Watch the agent's behavior carefully. Some agents will preemptively fix lint issues on subsequent edits because they learned from the hook output earlier in the session. This is the feedback loop working as intended.

Exercise 3: Custom Command

Objective

Create a custom slash command for Claude Code that runs your project's full test suite and reports results in a structured format. The command should be reusable by any team member who clones the repository.

Steps

  1. Set up a project with a test suite. If you do not have one, create a minimal project:
mkdir -p /tmp/command-practice/src && cd /tmp/command-practice
npm init -y
npm install --save-dev jest
mkdir -p .claude/commands

Create a simple source file and test:

// src/math.js
function add(a, b) { return a + b; }
function subtract(a, b) { return a - b; }
module.exports = { add, subtract };
// src/math.test.js
const { add, subtract } = require('./math');
test('add', () => expect(add(1, 2)).toBe(3));
test('subtract', () => expect(subtract(5, 3)).toBe(2));

Add to package.json: "scripts": { "test": "jest" }

  1. Create the custom command file at .claude/commands/test.md:
Run the project's full test suite and report the results.

Steps:
1. Run `npm test` and capture the full output
2. Report the results in this exact format:

## Test Results

- **Total tests**: [number]
- **Passed**: [number]
- **Failed**: [number]
- **Duration**: [time]

### Failures (if any)
For each failed test, report:
- Test name
- Expected vs. actual
- Relevant file and line

### Summary
One sentence: are we good to ship, or are there issues to address?

Do not modify any source or test files. This is a read-only operation.
  1. Start a Claude Code session and type /test. Observe the agent executing the test suite and formatting the results according to your command template.

  2. Now create a second command. Create .claude/commands/review.md:

Review the file specified by the user for code quality issues.

$ARGUMENTS

Check for:
1. **Bugs**: Logic errors, off-by-one errors, null/undefined risks
2. **Security**: Input validation, injection risks, hardcoded secrets
3. **Performance**: Unnecessary loops, missing early returns, repeated computations
4. **Style**: Naming conventions, function length, comment quality
5. **Testing**: Is this code adequately tested? What test cases are missing?

Format findings as a numbered list with severity labels: CRITICAL, WARNING, or INFO.
End with a one-line summary of overall code health.
  1. Test with /review src/math.js. Verify that the $ARGUMENTS placeholder is replaced with the file path.

  2. Commit both command files to version control. Verify that a teammate (or you, in a fresh clone) has access to the same commands.

Expected Outcome

  • /test runs the test suite and produces a formatted report without you specifying the steps each time.
  • /review <file> runs a structured code review with the $ARGUMENTS substitution working correctly.
  • Both commands are checked into the repository under .claude/commands/ and available to anyone who clones the project.
  • You understand how custom commands reduce repetition for common workflows.

Hints

  • The command filename (without .md) becomes the slash command name. Use kebab-case for multi-word commands: deploy-check.md becomes /deploy-check.
  • Commands in subdirectories use colon separators: .claude/commands/db/migrate.md becomes /db:migrate.
  • Keep commands focused on a single workflow. If a command tries to do too many things, split it into multiple commands.
  • The $ARGUMENTS placeholder only works if included in the command file. If you forget it, the command ignores any arguments the user passes.
  • Commands are essentially prompt templates. They do not execute code directly — they instruct the agent. The agent then decides which tools to use. This means you should write commands in clear, imperative language.

Exercise 4: The Hook Audit

Objective

Install five hooks from the examples in this module. Use the agent for 30 minutes of real work with all hooks active. Evaluate which hooks were genuinely useful and which caused friction. Remove the ones that were not helpful. Document your reasoning.

Steps

  1. Choose a real project you actively work on. This exercise only works with genuine tasks, not toy examples.

  2. Install these five hooks in your .claude/settings.json. Adapt the commands to your project's language and tooling:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hook": "echo \"$(date -Iseconds) | $TOOL_INPUT\" >> .claude/agent-commands.log"
      },
      {
        "matcher": "Bash",
        "hook": "if echo $TOOL_INPUT | grep -q 'git commit'; then BRANCH=$(git rev-parse --abbrev-ref HEAD); if [ \"$BRANCH\" = 'main' ] || [ \"$BRANCH\" = 'master' ]; then echo 'BLOCKED: Cannot commit to main/master.' && exit 1; fi; fi"
      },
      {
        "matcher": "Edit",
        "hook": "FILE=$(echo $TOOL_INPUT | jq -r '.file_path') && if echo $FILE | grep -qE '(\\.lock$|migrations/)'; then echo \"BLOCKED: $FILE is protected.\" && exit 1; fi"
      }
    ],
    "PostToolUse": [
      {
        "matcher": "Edit",
        "hook": "cd $PROJECT_DIR && FILE=$(echo $TOOL_INPUT | jq -r '.file_path') && if echo $FILE | grep -qE '\\.(js|ts|py)$'; then echo '[LINT CHECK]' && npx eslint $FILE 2>&1 | tail -5 || true; fi"
      }
    ],
    "Stop": [
      {
        "hook": "notify-send 'Claude Code' 'Task completed' 2>/dev/null || osascript -e 'display notification \"Task completed\" with title \"Claude Code\"' 2>/dev/null || true"
      }
    ]
  }
}

The five hooks are:

  • Audit log: Logs all bash commands to a file.
  • Branch protection: Blocks commits to main/master.
  • File protection: Blocks edits to lock files and migrations.
  • Auto-lint: Runs the linter after file edits.
  • Completion notification: Sends a desktop notification when the agent finishes.
  1. Work with the agent for 30 minutes on real tasks. Do not change your workflow to accommodate the hooks — work as you normally would.

  2. During the session, keep brief notes. For each hook, record:

    • Did it fire? How often?
    • When it fired, was the output helpful or distracting?
    • Did it catch a real problem or prevent a real mistake?
    • Did it slow you down noticeably?
    • Did it confuse the agent (the agent reacting to hook output in unhelpful ways)?
  3. After 30 minutes, stop and evaluate. Create a file called .claude/hook-audit.md with your findings. Use this template:

# Hook Audit — [Date]

## Hooks Tested

### 1. Audit Log (PreToolUse/Bash)
- **Fired**: [how many times]
- **Useful**: [yes/no]
- **Keep**: [yes/no]
- **Reasoning**: [why]

### 2. Branch Protection (PreToolUse/Bash)
- **Fired**: [how many times]
- **Useful**: [yes/no]
- **Keep**: [yes/no]
- **Reasoning**: [why]

### 3. File Protection (PreToolUse/Edit)
- **Fired**: [how many times]
- **Useful**: [yes/no]
- **Keep**: [yes/no]
- **Reasoning**: [why]

### 4. Auto-Lint (PostToolUse/Edit)
- **Fired**: [how many times]
- **Useful**: [yes/no]
- **Keep**: [yes/no]
- **Reasoning**: [why]

### 5. Completion Notification (Stop)
- **Fired**: [how many times]
- **Useful**: [yes/no]
- **Keep**: [yes/no]
- **Reasoning**: [why]

## Summary
- Hooks kept: [list]
- Hooks removed: [list]
- Key lesson: [one sentence]
  1. Based on your evaluation, update .claude/settings.json to only include the hooks you decided to keep.

  2. Review the audit log file (.claude/agent-commands.log). Is there anything surprising in what the agent ran? Did the log reveal commands you would not have noticed otherwise?

Expected Outcome

  • You have hands-on experience with five different hooks in a real workflow.
  • You have a documented evaluation of each hook's practical value.
  • Your settings file contains only hooks that earned their place.
  • You understand that more hooks are not always better — each one has a cost in latency, complexity, and context window usage.
  • You have an informed opinion about which types of hooks provide the most value for your specific workflow.

Hints

  • The most common result is keeping 2-3 hooks out of 5. This is normal and expected. Hooks that sound useful in theory often create friction in practice.
  • The audit log hook is almost always kept — it has near-zero cost and provides valuable forensics. The auto-lint hook is the most polarizing — some find it essential, others find it too noisy.
  • If a hook confuses the agent (the agent starts responding to hook output instead of your requests), that is a strong signal to remove it or reduce its verbosity.
  • Pay attention to latency. If you notice the agent pausing after every edit while the linter runs, measure how much time that adds. If it is 2 seconds, probably worth it. If it is 15 seconds, probably not.
  • The notification hook is often removed by people who work in a single terminal (they already see when the agent finishes) and kept by people who switch to other tasks while the agent works.
  • Your audit document is the real deliverable of this exercise. The evaluation skill — knowing which automation to keep and which to discard — is more valuable than any individual hook.

title: "Sub-Agents" last_updated: 2026-03-21 status: proven difficulty: intermediate prerequisites: [03-prompting-for-agents]

Sub-Agents

The Core Question: When and How Do I Delegate to Child Agents?

At some point, a single agent conversation hits a wall. The task is too big, the context window is too full, or two parts of the work could run simultaneously but cannot because the agent can only do one thing at a time. You find yourself wishing you had a second pair of hands.

Sub-agents are that second pair of hands. A sub-agent is a child agent spawned by a parent agent to handle a specific piece of work. It gets its own context window, does its job, and returns a result to the parent. The parent coordinates, the children execute. This is the same division of labor you see in any well-run engineering team — a tech lead does not write every line of code personally. They delegate, review, and synthesize.

Understanding when and how to use sub-agents is the difference between an agentic workflow that handles complex tasks gracefully and one that chokes on anything larger than a single file change.

The Delegation Principle

Think about how you delegate to a junior developer. You do not hand them a vague gesture toward the codebase and say "make it better." You give them a specific task, the context they need to complete it, and clear criteria for what done looks like. Then you step back and let them work.

Sub-agent delegation follows the same pattern. The parent agent writes a prompt — the handoff — that tells the child agent what to do, what context it needs, and what to return. The quality of that handoff determines the quality of the result. A vague handoff produces vague results. A precise handoff produces precise results.

This is worth internalizing because it means your skill at writing good prompts — the skill you developed in Module 03 — directly translates to your skill at orchestrating sub-agents. The prompt is the handoff. Everything the sub-agent needs to know must be in that prompt, because the sub-agent does not inherit the parent's full conversation history.

When to Use Sub-Agents

Sub-agents earn their keep in specific situations. Reaching for them reflexively wastes tokens. Reaching for them at the right moment transforms what is possible.

Independent tasks. When a task decomposes into parts that do not depend on each other, sub-agents can execute those parts simultaneously. Adding tests to three unrelated modules, updating documentation for five separate APIs, implementing three independent features — these are natural candidates.

Parallel work. A single agent is sequential. Sub-agents let you fan out. If you need to research three different approaches before deciding which to pursue, three sub-agents can explore all three in the time one agent would explore one.

Isolation needed. Some tasks benefit from a clean context window. If the parent agent's context is already full of information about the frontend, sending a sub-agent to work on the backend means the backend work gets a fresh, uncluttered context. The sub-agent is not confused by irrelevant details.

Exploratory research. When you need to understand something before acting on it — how a library works, what an unfamiliar module does, where a pattern is used across the codebase — a research sub-agent can investigate without cluttering the parent's context with raw findings. It returns a summary, not a firehose.

Large-scale refactoring. When a change touches many files that can be modified independently, sub-agents can each handle a subset while the parent coordinates the overall effort.

When NOT to Use Sub-Agents

Sub-agents are not free. They have startup cost, they consume additional tokens, and they introduce coordination complexity. Use them when the benefit exceeds the overhead.

Tightly coupled tasks. If step two depends on the exact output of step one, and step three depends on step two, running these as sub-agents adds coordination overhead without parallelism. Just do them sequentially in the parent.

Simple sequential work. If the task is straightforward and fits comfortably in a single context window, sub-agents add complexity for no gain. Do not use a team when one person is enough.

Context sharing is critical. Sub-agents get their own context window. They do not see the parent's full conversation history. If the task requires deep understanding of a long, nuanced discussion that has been building up in the parent session, a sub-agent will not have that understanding. You would have to summarize the entire discussion into the handoff prompt, and something will inevitably be lost.

The task is too small. Spawning a sub-agent has overhead — the system prompt loads, the agent orients itself, it reads necessary files. If the actual work is a five-line change, the overhead exceeds the work. Just do it in the parent.

The Fan-Out/Fan-In Pattern

The most common sub-agent pattern is fan-out/fan-in. It works like this:

  1. Decompose. The parent identifies the pieces of the task that can run independently.
  2. Fan out. The parent spawns one sub-agent for each piece, giving each a clear prompt.
  3. Execute. The sub-agents work simultaneously (or sequentially, depending on the tool).
  4. Fan in. The sub-agents return their results to the parent.
  5. Synthesize. The parent combines the results, resolves any inconsistencies, and produces the final output.

This pattern is powerful because it trades tokens for time. Three sub-agents running in parallel cost the same total tokens as one agent doing the work sequentially, but they finish faster. And each sub-agent gets a clean context window focused on its specific piece, which often produces better results than one agent trying to juggle everything.

The synthesis step is where the parent earns its keep. Raw sub-agent output often needs integration — making sure naming is consistent across the three new modules, resolving a conflict between two sub-agents' approaches, deciding which of three research summaries is most relevant.

Context Boundaries

This is the single most important thing to understand about sub-agents: they get their own context window. They do not inherit the parent's full state.

When a parent spawns a sub-agent, it writes a prompt. That prompt — plus the system prompt and any files the sub-agent reads — is the sub-agent's entire world. It does not know what the parent discussed five messages ago. It does not know about the decision the parent made three steps back. It does not know about the other sub-agents running in parallel.

This is a feature, not a bug. Clean context boundaries are what make sub-agents effective. A research sub-agent that inherits 50,000 tokens of irrelevant parent context is worse off than one that starts fresh with a clear, focused prompt. But it means you must be intentional about what you put in the handoff.

The practical implication: when prompting for sub-agent usage, include all relevant context in the prompt you expect the parent to pass along. Do not assume the sub-agent will "just know" things from the parent conversation. If it matters, say it explicitly.

Sub-Agent Types

Different types of work call for different types of sub-agents. While the underlying mechanism is the same — spawn an agent with a prompt — the intent and behavior patterns differ.

Research and exploration agents. Their job is to investigate and report back. They read files, trace through code, search for patterns, and return a summary. They do not modify anything. Use them when you need to understand something before deciding what to do.

Implementation agents. Their job is to make changes. They edit files, create new files, and verify their work. Use them for well-defined implementation tasks where the requirements are clear and the work is independent.

Review agents. Their job is to evaluate work. They read code, check for issues, compare against conventions, and report findings. Use them after implementation to catch problems before you commit.

Coordination Patterns

How sub-agents relate to each other determines the coordination pattern.

Independent. Each sub-agent works on a completely separate piece. They do not interact. The parent dispatches them, collects results, and synthesizes. This is the simplest pattern and the most common.

Sequential. The output of one sub-agent feeds into the next. A research agent investigates, its findings go into a planning agent, the plan goes into an implementation agent. The parent acts as the relay, passing context between stages.

Parallel. Multiple sub-agents run simultaneously on related but independent work. This maximizes throughput but requires that the tasks genuinely be independent — if two sub-agents try to edit the same file, you have a conflict.

The Overhead Question

Sub-agents are not free. Every sub-agent incurs costs:

  • Token cost. The sub-agent's system prompt, the handoff prompt, and everything it reads and generates all consume tokens. If the task is trivial, the overhead exceeds the work.
  • Startup time. The sub-agent needs to orient itself — read files, understand context, plan its approach. For very small tasks, this startup time is disproportionate.
  • Coordination cost. The parent spends tokens managing sub-agents — writing handoff prompts, reading results, synthesizing output. More sub-agents means more coordination.

The break-even point is roughly this: if the sub-task would take more than a few minutes in the parent and benefits from isolation, parallelism, or a fresh context window, a sub-agent is worthwhile. If the sub-task is quick, sequential, and benefits from the parent's existing context, just do it inline.

Error Handling

Sub-agents can fail. They can misunderstand the task, go off on a tangent, produce incorrect output, or hit errors they cannot recover from. The parent must be prepared for this.

In practice, error handling for sub-agents is not fundamentally different from error handling for any delegated work. You review the output. If it is wrong, you correct course — either by giving the sub-agent more specific instructions and trying again, or by handling the task differently.

The key insight is that sub-agent failures are contained. A sub-agent that goes off the rails does not corrupt the parent's context or pollute the work of other sub-agents. This is one of the advantages of isolation — failure stays local.

Worktrees: Isolated File Systems for Parallel Editing

One challenge with parallel sub-agents is file conflicts. If two sub-agents try to edit the same file simultaneously, the results are unpredictable. Worktrees solve this problem.

A worktree is an isolated copy of your file system — specifically, a git worktree that shares the same repository history but has its own working directory. Each sub-agent that runs in a worktree gets its own copy of the files. It can edit freely without conflicting with other agents or the parent. When it finishes, its changes can be merged back.

This pattern is essential for large-scale parallel refactoring. If you want three sub-agents to each update a different set of files, and some of those files might overlap (e.g., a shared import file), worktree isolation prevents conflicts. Each sub-agent works in its own sandbox, and the merge step handles integration.

Not every sub-agent task needs worktree isolation. If the sub-agents are working on truly independent files with no overlap, standard parallel execution is fine. Worktrees matter when there is any risk of file-level conflict.

Key Takeaways

  • Sub-agents are child agents that handle specific pieces of a larger task. The parent coordinates, the children execute.
  • The prompt is the handoff. Sub-agents do not inherit the parent's context. Everything they need must be in the prompt they receive.
  • Use sub-agents for independent, parallel, or isolation-needing work. Do not use them for tightly coupled sequential tasks or trivially small work.
  • The fan-out/fan-in pattern is the workhorse. Decompose, dispatch, execute, collect, synthesize.
  • Context boundaries are a feature. Clean separation keeps sub-agents focused, but it means you must be explicit about what they need to know.
  • Sub-agents have overhead. Token cost, startup time, and coordination cost are real. Use sub-agents when the benefit exceeds this cost.
  • Worktrees solve file conflicts. When parallel sub-agents might edit overlapping files, worktree isolation prevents collisions.
  • Sub-agent failures are contained. A failing sub-agent does not corrupt the parent or other sub-agents. Review output, correct course, and move on.

title: "Sub-Agents — Claude Code" tested_with: claude-code: "1.0.x" last_updated: 2026-03-21 status: proven difficulty: intermediate prerequisites: [03-prompting-for-agents]

Sub-Agents — Claude Code

The Agent Tool

Claude Code has a built-in mechanism for sub-agents called the Agent tool. When the main agent decides it needs to delegate work, it invokes the Agent tool with a prompt describing what the sub-agent should do. The sub-agent spins up, does its work, and returns a result to the parent.

You do not configure the Agent tool. It is available by default. The main agent decides when to use it based on the task at hand — though you can influence this decision by how you phrase your prompts.

The Agent tool is not a separate binary or service. It is another invocation of Claude within the same Claude Code session, but with its own context window and its own tool access. The sub-agent can read files, search the codebase, run commands, and make edits, just like the parent. What it cannot do is see the parent's conversation history or communicate with other sub-agents directly.

Built-In Sub-Agent Types

Claude Code recognizes several sub-agent patterns, each optimized for a different kind of work.

General-purpose agent. The default. The parent sends a prompt, the sub-agent executes it, and returns the result. No special constraints. This is what you get when you say "use a sub-agent to handle X."

Explore agent. Optimized for research and investigation. An Explore agent reads files, traces through code, searches for patterns, and returns a summary of its findings. It is designed to look without touching — it investigates but does not modify files. Use it when you need to understand something before acting.

Plan agent. Optimized for designing an approach before implementation. A Plan agent analyzes the task, considers the codebase structure, and produces a step-by-step plan. It does not implement the plan itself. The parent can then execute the plan directly or dispatch implementation agents for each step.

These are not hard categories with different underlying models. They are behavioral patterns — the system prompt and instructions given to the sub-agent differ, which shapes its behavior. The Explore agent is told to investigate and report; the Plan agent is told to analyze and design. The underlying capability is the same.

Custom Sub-Agent Types

You can define your own sub-agent types by creating files in the .claude/agents/ directory in your project. Each file defines a sub-agent persona with a specific system prompt, constraints, and behavior.

A custom agent file is a Markdown file that describes the agent's role and instructions. For example, you might create .claude/agents/security-reviewer.md that instructs the sub-agent to review code specifically for security vulnerabilities, or .claude/agents/api-designer.md that focuses on REST API design patterns used by your team.

Custom agents are useful when you have recurring delegation patterns. Instead of writing the same detailed handoff prompt every time you want a security review, you define the agent once and reference it by name.

How to Trigger Sub-Agents

The main agent decides when to use sub-agents based on its assessment of the task. For complex, multi-part tasks, it will often spawn sub-agents without being asked. But you can influence this decision — and sometimes you should.

Explicit prompting. Tell the agent to use sub-agents directly. The agent responds to clear instructions about delegation.

Implicit triggering. Describe a task that naturally decomposes into independent parts. A request like "add comprehensive tests for modules A, B, and C" is a natural candidate for parallel sub-agents, and the agent may choose to use them without prompting.

The most reliable approach is explicit prompting. If you want sub-agents, say so. If you want parallelism, say so. The agent has good judgment about when sub-agents are appropriate, but your explicit instruction removes ambiguity.

Prompting for Sub-Agent Usage

Here are patterns that reliably trigger sub-agent behavior in Claude Code:

Research dispatch:

Use an explore agent to investigate how authentication is implemented
across this codebase. I need to understand the flow from login to
token validation, including any middleware involved.

Parallel implementation:

I need to add input validation to three endpoints: /users, /orders,
and /products. These are independent. Use parallel agents to implement
all three simultaneously. Each should follow the validation pattern
used in /auth/login.

Review dispatch:

I just finished implementing the new caching layer. Dispatch a review
agent to check the implementation for correctness, edge cases, and
consistency with the rest of the codebase.

Planning:

Before we start implementing the new notification system, use a plan
agent to design the approach. It should consider the existing event
system, the database schema, and the API layer.

Multi-step with coordination:

This migration needs three phases:
1. First, use an explore agent to map all usages of the old API
2. Then, use a plan agent to design the migration path
3. Finally, use parallel agents to update each module

Wait for each phase to complete before starting the next.

Notice the pattern in each of these prompts: they specify what kind of work the sub-agent should do, they provide the context the sub-agent needs, and they describe what the result should look like. The prompts are complete enough that the sub-agent can succeed without asking follow-up questions.

The Sub-Agent Lifecycle

Understanding the lifecycle helps you predict behavior and debug issues.

  1. Spawn. The parent decides to use a sub-agent and invokes the Agent tool with a prompt. The prompt describes the task, provides necessary context, and sets expectations for the output.

  2. Initialize. The sub-agent starts with a fresh context window. It receives its system prompt (which may differ based on agent type) and the task prompt from the parent. It does not see the parent's prior conversation.

  3. Execute. The sub-agent works through the task using the same tools available to the parent — file reading, search, bash execution, file editing. It may read many files, run commands, and iterate on its approach.

  4. Return. When the sub-agent completes its task, it returns a result to the parent. This result is a text summary — what it did, what it found, what it changed. The parent receives this as tool output and decides what to do next.

  5. Terminate. The sub-agent's context window is discarded. If you need to continue work that a sub-agent started, you either have the parent do it or spawn a new sub-agent with appropriate context.

Context Passing

What the sub-agent knows and what it does not is the most common source of sub-agent problems.

What the sub-agent gets:

  • Its system prompt (general-purpose, explore, plan, or custom)
  • The task prompt written by the parent
  • Anything it reads from the file system during execution
  • Output from any commands it runs

What the sub-agent does NOT get:

  • The parent's conversation history
  • Results from other sub-agents (unless the parent explicitly includes them in the prompt)
  • The parent's CLAUDE.md context (the sub-agent reads CLAUDE.md itself from disk, but it may not prioritize the same sections)
  • Any mental model the parent has built up over the course of the session

This means the task prompt must be self-contained. If the parent has spent ten messages discussing a specific approach with you, and then dispatches a sub-agent to implement that approach, the sub-agent knows nothing about those ten messages. The parent must distill the relevant decisions into the handoff prompt.

In practice, the parent agent is usually good at writing handoff prompts — it knows what the sub-agent needs. But when you notice a sub-agent producing unexpected results, the first thing to check is whether the handoff prompt contained sufficient context.

Worktree Isolation

When sub-agents need to modify files in parallel, worktree isolation prevents conflicts. A worktree gives each sub-agent its own copy of the working directory, backed by the same git repository.

Worktree isolation matters when:

  • Multiple sub-agents might edit the same file
  • You want clean separation between parallel changes
  • You need the ability to review each sub-agent's changes independently before merging

When a sub-agent runs in a worktree, it operates on a separate checkout. Its file edits do not affect the main working directory or other sub-agents' worktrees. When it finishes, the changes exist as a separate branch or set of modifications that can be merged.

You can prompt for worktree isolation explicitly:

Use worktree-isolated agents to refactor the logging system in
parallel. Agent 1 should handle src/api/, Agent 2 should handle
src/workers/, and Agent 3 should handle src/services/. Each might
need to update shared imports, so keep them isolated.

For tasks where the files are completely independent — for example, adding tests to three modules that share no code — worktree isolation is unnecessary overhead. Use it when there is a real risk of file-level conflict.

Practical Pattern: Research Agent

Use this when you need to understand something before making decisions.

Explore the codebase to find all API endpoints. For each endpoint,
identify:
- The HTTP method and path
- The handler function and its file location
- Any middleware applied
- Whether it has tests

Return a structured summary organized by module.

The explore agent reads route definitions, traces through middleware chains, checks for test files, and returns a comprehensive map. The parent can then use this map to plan further work — identifying untested endpoints, finding inconsistent patterns, or planning a migration.

Practical Pattern: Parallel Implementation

Use this when you have multiple independent implementation tasks.

Add input validation to these three endpoints using the Zod schema
pattern established in src/validation/schemas.ts:

1. POST /api/users - validate email (must be valid format),
   name (required, 1-100 chars), role (must be 'admin' or 'user')
2. POST /api/orders - validate userId (required UUID),
   items (non-empty array), total (positive number)
3. POST /api/products - validate name (required, 1-200 chars),
   price (positive number), category (must be from VALID_CATEGORIES)

Use parallel agents, one per endpoint. Each should add the validation
schema, integrate it into the handler, and add tests for both valid
and invalid inputs.

Each sub-agent gets a clear, self-contained specification. They can all reference the existing Zod schema pattern independently. The parent collects the results and verifies consistency.

Practical Pattern: Review Agent

Use this after implementation to catch issues before committing.

Review the changes I just made to the authentication system.
Specifically check:
- Are there any security issues with the token generation?
- Does the error handling cover all failure cases?
- Are the new functions consistent with the patterns in the rest
  of src/auth/?
- Are there any missing tests?

Be specific about any issues found. Reference exact file paths
and line numbers.

The review agent reads the changed files, compares them against existing patterns, and produces a findings report. This is especially valuable because the review happens in a fresh context — the agent is not anchored by the assumptions that built up during implementation.

Practical Pattern: Plan Agent

Use this before implementation to design the approach.

I need to add a notification system to this application. Users should
receive notifications when:
- An order they placed changes status
- A product on their wishlist goes on sale
- An admin sends a broadcast message

Design an implementation plan. Consider the existing event system in
src/events/, the database schema, and the API patterns used elsewhere.
The plan should cover: data model changes, new API endpoints, event
handlers, and a testing strategy.

The plan agent investigates the existing architecture, identifies integration points, and produces a structured plan. The parent — and you — can review the plan before committing to implementation, catching design issues early.

Effective Sub-Agent Prompts

The quality of the handoff prompt is the single biggest lever on sub-agent performance. Follow these principles:

Be complete. Include everything the sub-agent needs. Do not rely on context it does not have. If there is a relevant decision from earlier in your session, state it explicitly.

Be specific. "Improve the tests" is a bad handoff. "Add edge case tests for the parseDate function covering null input, invalid formats, and timezone boundaries" is a good handoff.

Include constraints. If there are things the sub-agent should not do — do not modify the database schema, do not change the public API, use only the existing test framework — say so in the prompt.

Describe the expected output. "Return a summary of findings" is vague. "Return a list of all endpoints, grouped by module, with the HTTP method, path, and whether tests exist" is precise.

Reference existing patterns. If the sub-agent should follow a pattern that exists in the codebase, point to a specific file: "Follow the pattern in src/validators/user.ts." This gives the sub-agent a concrete example to work from.

Monitoring Sub-Agent Progress

When a sub-agent is running, Claude Code shows its activity in the terminal. You can see which files it reads, which commands it runs, and what edits it makes. This visibility lets you catch problems early — if a sub-agent is reading irrelevant files or going down an unproductive path, the parent will eventually see the result and can try again.

For long-running sub-agent tasks, be patient. Sub-agents go through the same orient-plan-execute cycle as the parent. They read files, think, try things, and iterate. This takes time, especially for exploratory tasks where the sub-agent is mapping unfamiliar territory.

Cost Implications

Each sub-agent uses its own tokens. A task that spawns three sub-agents will use roughly three times the tokens of doing the same work sequentially in the parent — sometimes more, because each sub-agent incurs its own startup overhead (reading files, orienting itself).

The tradeoff is time. Three parallel sub-agents finish in roughly the time of one. If the task is time-sensitive or the work is substantial, the extra token cost is worth it.

For cost-conscious workflows, consider whether sub-agents are necessary for each task. A quick five-line change does not justify the overhead of a sub-agent. A 200-line implementation across three modules does.

You can see token usage in Claude Code's output. Pay attention to this during your early sub-agent experiments to build intuition for what tasks are cost-effective to delegate.

Continuing a Sub-Agent's Work with SendMessage

Sometimes a sub-agent finishes its initial task but you realize follow-up work is needed. Rather than spawning an entirely new sub-agent that must re-orient from scratch, you can use SendMessage to continue a previous sub-agent's conversation.

SendMessage lets the parent send additional instructions to a sub-agent that has already completed a task. The sub-agent retains its context from the first interaction — the files it read, the understanding it built, the changes it made — and can continue where it left off. This avoids the redundant startup cost of a fresh sub-agent re-reading the same files and rebuilding the same understanding.

Use SendMessage when:

  • A sub-agent completed its task but the results need refinement
  • You want to ask follow-up questions about a research agent's findings
  • An implementation sub-agent needs to handle a case it missed

Do not use SendMessage as a substitute for a well-specified initial prompt. If you find yourself sending many follow-up messages to a sub-agent, that usually means the initial handoff prompt was underspecified.


title: "Sub-Agents — Codex CLI" tested_with: codex-cli: "0.2.x" last_updated: 2026-03-21 status: proven difficulty: intermediate prerequisites: [03-prompting-for-agents]

Sub-Agents — Codex CLI

How Codex Handles Task Delegation

Codex CLI, OpenAI's open-source terminal coding agent, approaches sub-agents differently than Claude Code. Where Claude Code has a first-class Agent tool that the model invokes directly within a session, Codex CLI operates as a single-agent system by default. There is no built-in mechanism for the running Codex agent to spawn child agents within the same session.

This does not mean you cannot achieve sub-agent-like workflows with Codex. It means the patterns are different, and in some cases you orchestrate the delegation yourself rather than relying on the tool to handle it internally.

The Codex Sub-Agent Model

Codex CLI is designed around a single-agent loop: you give it a task, it works through that task step by step, and it returns the result. Its strength is in focused, sequential execution — reading files, reasoning about code, making edits, running commands to verify, and iterating.

Within this single-agent loop, Codex can decompose complex tasks into steps and execute them in order. It does this naturally, without sub-agents, by maintaining its plan in context and working through each step. For many tasks, this is sufficient and efficient. The overhead of sub-agent coordination is avoided entirely.

The key difference from Claude Code's model: in Claude Code, the parent agent can dispatch a sub-agent and continue thinking about the broader task while the sub-agent works on a specific piece. In Codex CLI, the agent handles everything sequentially within one context window. This means Codex is simpler to reason about but lacks the parallelism and context isolation that sub-agents provide.

Available Delegation Patterns

While Codex CLI does not have built-in sub-agents, several delegation patterns are available.

Sequential decomposition within a single session. Codex handles multi-step tasks by working through them one at a time. You can prompt for explicit decomposition:

Break this task into independent steps and execute them one at a time:
1. Add input validation to the /users endpoint
2. Add input validation to the /orders endpoint
3. Add input validation to the /products endpoint

Complete each step fully before moving to the next.

Codex will treat this as a sequential checklist, completing each item before moving on. You get the same work done, just not in parallel.

Scoped sessions. You can run multiple Codex sessions yourself, each focused on a different part of the task. Open three terminal windows, run codex in each, and give each one a specific, self-contained task. You are the orchestrator — you decompose the task, dispatch each session, and synthesize the results.

# Terminal 1
codex "Add input validation to POST /api/users following the pattern in src/validation/schemas.ts"

# Terminal 2
codex "Add input validation to POST /api/orders following the pattern in src/validation/schemas.ts"

# Terminal 3
codex "Add input validation to POST /api/products following the pattern in src/validation/schemas.ts"

This gives you parallelism, but you manage the coordination. If the sessions edit overlapping files, you will need to resolve conflicts manually.

Non-interactive mode for batch tasks. Codex CLI's non-interactive mode lets you script multiple invocations:

codex --quiet "Add unit tests for src/auth/login.ts"
codex --quiet "Add unit tests for src/auth/register.ts"
codex --quiet "Add unit tests for src/auth/reset-password.ts"

Each invocation is a separate session with its own context. You can run them sequentially in a script or in parallel using shell background processes. This is the closest analog to Claude Code's fan-out pattern.

Using the OpenAI Agents SDK for Orchestration

For teams that need true programmatic sub-agent orchestration, Codex CLI's underlying architecture integrates with the OpenAI Agents SDK. The Agents SDK is a Python framework for building multi-agent systems, and Codex CLI can serve as a tool within that framework.

The pattern looks like this:

  1. Define a parent agent using the Agents SDK that understands the overall task.
  2. Give the parent access to Codex as a tool — it can invoke Codex sessions programmatically.
  3. The parent decomposes the task and dispatches Codex invocations for each piece.
  4. Results flow back to the parent agent, which synthesizes them.

This approach gives you full control over orchestration — parallel execution, custom error handling, result aggregation — at the cost of building the orchestration layer yourself. It is appropriate for teams building repeatable agentic workflows, CI/CD pipelines, or automated code review systems, not for ad-hoc interactive use.

A minimal example of this pattern:

from agents import Agent, Runner
import subprocess

def run_codex(task: str) -> str:
    result = subprocess.run(
        ["codex", "--quiet", "--json", task],
        capture_output=True, text=True
    )
    return result.stdout

# Define tools that wrap Codex invocations
# The parent agent orchestrates multiple Codex calls

The Agents SDK handles the orchestration primitives — deciding when to dispatch, collecting results, managing failures. Codex handles the actual coding work within each invocation.

Practical Examples of Delegation

Research before implementation. Run a Codex session to investigate, then use the findings in a second session to implement.

# Session 1: Research
codex "Analyze how error handling is done across this codebase. List every pattern you find with file locations and examples. Write your findings to /tmp/error-handling-analysis.txt"

# Session 2: Implement (after reviewing the analysis)
codex "Based on the error handling patterns documented in /tmp/error-handling-analysis.txt, add consistent error handling to src/api/orders.ts"

The file system acts as the context bridge between sessions. The first session writes findings to a file; the second session reads them. This is explicit and inspectable — you can review the analysis before proceeding.

Parallel file modifications. Use git branches to isolate parallel work.

# Create branches for each task
git checkout -b feat/validate-users
codex "Add input validation to POST /api/users"
git add -A && git commit -m "Add user endpoint validation"

git checkout main
git checkout -b feat/validate-orders
codex "Add input validation to POST /api/orders"
git add -A && git commit -m "Add order endpoint validation"

# Merge branches
git checkout main
git merge feat/validate-users
git merge feat/validate-orders

This is more manual than Claude Code's worktree isolation, but it achieves the same goal: isolated file systems for parallel work, with git handling the merge.

Review as a separate session. After implementation, run a fresh Codex session specifically for review.

# After making changes, get a fresh-eyes review
codex "Review the changes in the current git diff. Check for bugs, security issues, missing edge cases, and style inconsistencies. Do not make any changes — just report findings."

The fresh session provides the same benefit as a sub-agent review in Claude Code: a clean context without the assumptions that built up during implementation.

Limitations Compared to Claude Code

Be clear-eyed about what Codex CLI does not do in this area:

No in-session sub-agents. The running Codex agent cannot spawn a child agent within the same session. It cannot delegate a piece of work to a focused sub-agent while continuing to think about the broader task.

No built-in parallel execution. Codex works sequentially within a session. Parallelism requires multiple sessions, which you orchestrate.

No automatic context passing between sessions. When you run multiple Codex sessions, each starts fresh. There is no built-in mechanism for one session to pass context to another. You handle this through files, git branches, or your own orchestration layer.

No sub-agent types. Codex does not have the concept of explore agents, plan agents, or custom agent types. Every session is a general-purpose agent. You shape its behavior through your prompt, not through a type system.

No SendMessage equivalent. You cannot continue a previous Codex session with follow-up instructions. Each codex invocation is independent. If you need follow-up work, you start a new session with appropriate context.

Workarounds for Missing Features

The limitations are real but not insurmountable. Here are practical workarounds.

For context isolation: use the file system. Write intermediate results to files. A research session writes findings to a temp file; an implementation session reads that file. This is more explicit than automatic context passing, which can be an advantage — you see exactly what context is being shared.

For parallelism: use shell parallelism. Run multiple Codex sessions in parallel using background processes, tmux panes, or separate terminals. You manage the coordination, but you get true parallelism.

For agent types: use prompt engineering. Instead of selecting an agent type, write your prompt to shape the agent's behavior. "Investigate and report without making changes" produces explore-like behavior. "Design an implementation plan before writing any code" produces plan-like behavior.

For sub-agent orchestration: use the Agents SDK. If you need programmatic orchestration, build it with the OpenAI Agents SDK. This is more work upfront but gives you full control.

Using Multiple Codex Sessions as a Manual Sub-Agent Pattern

The most practical sub-agent pattern for day-to-day Codex use is the manual multi-session approach. You act as the parent agent: you decompose the task, dispatch sessions, and synthesize results.

The workflow:

  1. Decompose the task yourself. Identify the independent pieces.
  2. Write a clear prompt for each piece. Each prompt should be self-contained — do not assume the session knows anything about the other sessions.
  3. Run sessions in parallel. Use separate terminals or background processes.
  4. Review each session's output independently.
  5. Integrate the results. Merge branches, resolve conflicts, verify consistency.

This is more manual than Claude Code's sub-agent system, but it has advantages: you have full visibility into each session's work, you control the decomposition, and you can intervene at any point. For developers who prefer tight control over their agentic workflows, this is often the preferred approach regardless of which tool they use.

The key discipline is writing good prompts for each session. Since there is no parent agent writing handoff prompts for you, you must do this yourself. Make each prompt specific, complete, and self-contained. Reference specific files and patterns. Describe the expected output. Include constraints. Everything from Module 03 about prompt quality applies here, multiplied by the number of sessions you are managing.


title: "Exercises — Sub-Agents" last_updated: 2026-03-21 status: proven difficulty: intermediate prerequisites: [03-prompting-for-agents]

Exercises — Sub-Agents

These exercises are designed to build your intuition for when and how to delegate to sub-agents. Work through them in order — each builds on skills from the previous one.

Use your own real project for all exercises. The observations you make will be specific to your codebase, which is exactly what makes them valuable.


Exercise 1: The Research Dispatch

Objective

Learn how exploration sub-agents work by dispatching one to map your project's architecture, then comparing its findings to your own understanding.

Steps

  1. Before starting, write down your own mental model of your project's architecture. Spend five minutes listing the major modules, their responsibilities, and how they connect. Keep this list private — do not share it with the agent.

  2. Start a new agent session in your project directory.

  3. Give the agent this prompt (adapt the project name):

    Use an explore agent to investigate this codebase and produce an
    architecture summary. It should identify:
    - The major modules or components
    - The responsibility of each
    - The key dependencies between them
    - Any patterns or conventions it notices
    - Areas that seem under-tested or under-documented
    
    The explore agent should read broadly, not just the top-level files.
    Return a structured summary.
    
  4. Wait for the sub-agent to complete. Read its full report.

  5. Compare the sub-agent's findings against your private list. Note:

    • What did the sub-agent find that you forgot or did not think of?
    • What did you know that the sub-agent missed?
    • Where did the sub-agent's understanding differ from yours?
  6. If using Codex CLI, run this as a standalone session with a similar prompt. The exercise works the same way — the tool handles it in a single session rather than dispatching a sub-agent, but the comparison is equally valuable.

Expected Outcome

The sub-agent should produce a reasonable architecture map. It will likely find structural patterns you take for granted and miss domain-specific knowledge that is not in the code. The delta between your understanding and the agent's tells you what context is missing from your project memory files — useful input for Module 02.

Hints

  • If the sub-agent's report is superficial, the codebase may be large enough that it needs more specific direction. Try narrowing the scope: "Focus on the src/api/ directory."
  • If you use Claude Code, watch which files the explore agent reads. This tells you how the agent navigates your codebase — useful information for writing future prompts.

Exercise 2: Fan-Out/Fan-In

Objective

Experience the fan-out/fan-in pattern firsthand by dispatching parallel sub-agents for independent tasks and observing how the parent coordinates the results.

Steps

  1. Identify three independent tasks in your project. Good candidates:

    • Add tests to three different modules
    • Add input validation to three separate endpoints
    • Write documentation for three unrelated functions
    • Fix three independent linting issues

    The key requirement: the tasks must be truly independent. No shared files, no ordering dependencies.

  2. Write a single prompt that asks the agent to use parallel sub-agents:

    I need these three tasks done in parallel. Use a separate agent for each:
    
    1. [First task with full context]
    2. [Second task with full context]
    3. [Third task with full context]
    
    Each task is independent — the agents should not need to coordinate.
    After all three finish, summarize what was done.
    
  3. Observe the execution. Pay attention to:

    • How the parent decomposes the work
    • Whether the sub-agents truly run in parallel
    • How the parent synthesizes the results
    • How long the total process takes
  4. After completion, review each sub-agent's work individually. Check for consistency — did all three follow the same patterns, naming conventions, and code style?

  5. If using Codex CLI, run three separate sessions in parallel (three terminal windows or background processes). You act as the parent — decompose, dispatch, and synthesize yourself.

Expected Outcome

The three tasks should complete faster than they would sequentially (in Claude Code). The results should be functionally correct but may have minor inconsistencies in style or approach — this is the tradeoff of parallel independent work. The synthesis step (done by the parent or by you) is where those inconsistencies get resolved.

Hints

  • If you cannot identify three independent tasks, creating test files for three separate modules is almost always a safe choice.
  • If one sub-agent fails while the others succeed, that is fine. Note what caused the failure — it is usually an underspecified prompt. The other sub-agents' work is not affected.
  • For Codex CLI users: use git stash or branches to isolate each session's changes if they might conflict.

Exercise 3: The Handoff Quality Test

Objective

Demonstrate the impact of prompt quality on sub-agent output by running the same task with a vague prompt and a detailed prompt, then comparing results.

Steps

  1. Choose a single, moderately complex task. Good examples:

    • Add comprehensive error handling to a module
    • Refactor a function to follow a specific pattern
    • Add validation to a data processing pipeline
  2. Write two prompts for the same task.

    The vague prompt:

    Use a sub-agent to improve the error handling in src/api/orders.ts.
    

    The detailed prompt:

    Use a sub-agent to add error handling to src/api/orders.ts. Specifically:
    - Wrap the database calls in try/catch blocks
    - Use the AppError class from src/errors/AppError.ts for all thrown errors
    - Log errors using the logger from src/utils/logger.ts
    - Return appropriate HTTP status codes: 400 for validation errors,
      404 for not-found, 500 for unexpected errors
    - Follow the pattern established in src/api/users.ts
    - Add tests for each error case in src/api/__tests__/orders.test.ts
    Do not modify any other files.
    
  3. Run the vague prompt first. Save the results (use git diff > /tmp/vague-result.diff or similar).

  4. Revert the changes (git checkout .).

  5. Run the detailed prompt. Save these results too.

  6. Compare:

    • Did both produce working code?
    • Which matched your project's conventions better?
    • Which required less follow-up correction?
    • How did token usage compare?

Expected Outcome

The detailed prompt should produce significantly better results — closer to your expectations, more consistent with existing patterns, fewer issues to fix. The vague prompt will likely produce something that technically works but diverges from your project's style or misses important requirements.

This exercise makes the abstract principle concrete: the prompt is the handoff, and handoff quality determines output quality.

Hints

  • Revert completely between runs. Use git checkout . to reset all changes. You want a clean comparison.
  • If both prompts produce identical results, your task may be too simple. Try something with more design choices.
  • Document the specific differences you observe. This becomes reference material for writing better prompts in the future.

Exercise 4: Worktree Isolation

Objective

Understand how worktree isolation prevents conflicts when parallel sub-agents modify overlapping parts of the codebase.

Steps

  1. Identify a refactoring task that touches shared code. Good candidates:

    • Renaming a commonly-imported utility function
    • Changing a shared type definition that multiple modules reference
    • Updating a logging pattern across several files that all import the same logger
  2. First, try without isolation. Ask the agent to use parallel sub-agents to make the changes across multiple modules without worktree isolation:

    Refactor the [shared pattern] across these three modules in parallel:
    - src/api/users.ts
    - src/api/orders.ts
    - src/api/products.ts
    
    Each may need to update the shared import in src/utils/[shared-file].ts.
    Use parallel agents without worktree isolation.
    
  3. Observe what happens. Note any conflicts, overwrites, or inconsistencies.

  4. Revert the changes.

  5. Now try with worktree isolation:

    Refactor the [shared pattern] across these three modules in parallel.
    Use worktree-isolated agents so each has its own copy of the files:
    - Agent 1: src/api/users.ts
    - Agent 2: src/api/orders.ts
    - Agent 3: src/api/products.ts
    
    Each may need to update the shared import in src/utils/[shared-file].ts.
    Use worktree isolation to prevent conflicts.
    
  6. Compare the two approaches:

    • Did the non-isolated version produce conflicts?
    • How did the isolated version handle the shared file?
    • Was the merge process smooth?
  7. For Codex CLI users: simulate worktree isolation using git branches. Create a branch for each task, run a separate Codex session on each branch, then merge them. Note any merge conflicts and how you resolve them.

Expected Outcome

The non-isolated version should produce at least one conflict or overwrite in the shared file. The isolated version should handle the parallel work cleanly, with each sub-agent's changes preserved in its own worktree. The merge step may still require resolving conflicting changes to the shared file, but the conflicts will be explicit and manageable rather than silent overwrites.

Hints

  • If your codebase does not have obvious shared files, create a small test scenario: three modules that all import the same utility, and a task to update how that utility is called.
  • Worktree isolation adds overhead. The point of this exercise is to experience both approaches so you can make an informed choice about when the overhead is justified.
  • If you are on Codex CLI and git merge conflicts arise, that is the expected behavior. The exercise is about understanding why isolation matters, not about avoiding all friction.

Exercise 5: The Cost Calculator

Objective

Build intuition for the cost-benefit tradeoff of sub-agents by comparing a single-session approach against a sub-agent approach for the same task.

Steps

  1. Choose a task that is complex enough to benefit from sub-agents but feasible in a single session. Good examples:

    • Add input validation and tests to 5 API endpoints
    • Write documentation for 4 modules
    • Add logging to 6 service functions
  2. Run the task in a single session (no sub-agents):

    Add input validation to all five of these endpoints: /users, /orders,
    /products, /inventory, /reports. Do them one at a time, sequentially.
    Follow the pattern in src/validation/schemas.ts. Add tests for each.
    

    Record:

    • Total wall-clock time from start to finish
    • Total token usage (check your API dashboard or the tool's output)
    • Quality of the output (does it work? is it consistent?)
  3. Revert all changes.

  4. Run the same task with sub-agents:

    Add input validation to these five endpoints. Use parallel agents,
    one per endpoint:
    1. /users - [specific validation requirements]
    2. /orders - [specific validation requirements]
    3. /products - [specific validation requirements]
    4. /inventory - [specific validation requirements]
    5. /reports - [specific validation requirements]
    
    Each agent should follow the pattern in src/validation/schemas.ts
    and add tests. Run all five in parallel.
    

    Record the same metrics: time, tokens, quality.

  5. Compare:

    • Time. The sub-agent version should be faster in wall-clock time, possibly significantly.
    • Tokens. The sub-agent version will likely use more total tokens due to startup overhead for each sub-agent (each reads files independently).
    • Quality. Check for consistency. Did the single-session version produce more consistent code (because one agent did everything)? Did the sub-agent version have style inconsistencies across the five endpoints?
  6. Calculate the effective cost-per-minute for each approach. If the sub-agent version costs $2 in tokens but takes 5 minutes, and the single-session version costs $1.20 but takes 15 minutes, what is your time worth?

Expected Outcome

You should find that sub-agents trade tokens for time. The parallel approach finishes faster but uses more total tokens. Quality is comparable, though the single-session approach may be slightly more consistent because one agent maintained a unified mental model across all five tasks.

The "right" choice depends on your situation. If you are blocked waiting for the agent and your time is expensive, sub-agents win. If you are running tasks in the background and token cost matters more than speed, a single session is more economical.

Hints

  • Record token usage before and after each run. Most tools show cumulative token usage somewhere in their interface or API dashboard.
  • If you cannot measure tokens precisely, estimate based on session length and number of tool calls. The relative comparison is more important than absolute numbers.
  • For Codex CLI: the "sub-agent" version means running five parallel Codex sessions. The comparison is still valid — you are comparing sequential vs. parallel execution.
  • Do not obsess over the exact numbers. The goal is to build intuition, not to produce a precise cost analysis. After this exercise, you should have a gut sense for when sub-agents are worth the overhead and when they are not.

title: "MCP Servers" last_updated: 2026-03-21 status: experimental difficulty: intermediate prerequisites: [04-hooks-and-commands]

MCP Servers

Coming in V2. This module is scaffolded with an outline.

Core Question

How do I extend agent capabilities with external tools and data?

Module Outline

Concepts

  1. What is MCP (Model Context Protocol)? — An open protocol for connecting AI agents to external tools, data sources, and services. MCP servers expose capabilities that agents can discover and invoke.

  2. Why MCP matters — Without MCP, your agent can only read files, run commands, and search. With MCP, it can query databases, call APIs, access knowledge bases, manage cloud infrastructure, and more.

  3. The MCP architecture — Client (the AI agent) ↔ Server (your tool/service) ↔ Resource (the actual data or capability). Servers declare what they can do; clients discover and use those capabilities.

  4. Built-in vs. custom MCP servers — Many tools come with pre-built MCP servers (databases, GitHub, Slack, etc.). You can also build custom servers for your specific tools and workflows.

  5. When to use MCP vs. bash commands — MCP provides structured, discoverable interfaces. Bash is ad-hoc. Use MCP when you want the agent to understand what's available and use it reliably.

  6. Security considerations — MCP servers can access sensitive resources. Permission models, scoping, and audit logging matter.

Tool-Specific Content

  • Claude Code: MCP server configuration in settings.json, built-in MCP support, connecting to databases, APIs, custom tools
  • Codex CLI: MCP support status, alternative approaches, Agents SDK integration

Exercises

  1. Connect to a pre-built MCP server (e.g., filesystem, GitHub)
  2. Use an MCP-connected agent to query a database
  3. Build a minimal custom MCP server
  4. Compare agent performance with and without MCP tools available

Key Resources


title: "Session Architecture" last_updated: 2026-03-21 status: experimental difficulty: intermediate prerequisites: [05-sub-agents]

Session Architecture

Coming in V2. This module is scaffolded with an outline.

Core Question

How do I structure work across sessions for continuity and recovery?

Module Outline

Concepts

  1. Sessions as units of work — Each session has a context window, a goal, and a set of changes. How you scope sessions determines your productivity.

  2. The session sizing problem — Too small: overhead of re-establishing context. Too large: context window fills up, agent loses focus, risk of compounding errors.

  3. Session continuity patterns — How to hand off work between sessions: commit messages as context, session summaries, TODO files, plan files.

  4. The --continue and --resume patterns — When to continue a previous session vs. starting fresh.

  5. Session checkpointing — Committing at natural breakpoints so you can recover from agent mistakes.

  6. Multi-session workflows — Breaking large projects into a sequence of focused sessions with clear handoff points.

  7. The context window budget — Understanding what fills the context window and how to manage it (/compact, targeted file reads, concise prompts).

Tool-Specific Content

  • Claude Code: --continue, --resume, /compact, session history, context management
  • Codex CLI: session management, conversation history, context handling

Exercises

  1. Plan a multi-session workflow for a medium-sized feature
  2. Practice session handoffs: end one session, start another, verify continuity
  3. Context window management: monitor usage, practice /compact, optimize prompts
  4. Recovery drill: intentionally create a messy session, recover using checkpoints

title: "Agent Teams" last_updated: 2026-03-21 status: experimental difficulty: advanced prerequisites: [05-sub-agents, 07-session-architecture]

Agent Teams

Coming in V2. This module is scaffolded with an outline.

Core Question

How do multiple agents collaborate on complex tasks?

Module Outline

Concepts

  1. From sub-agents to teams — Sub-agents handle isolated tasks. Agent teams coordinate across roles: implementer, reviewer, tester, architect.

  2. Team topologies — Patterns for organizing agent collaboration:

    • Pair: implementer + reviewer
    • Pipeline: architect → implementer → tester → reviewer
    • Swarm: multiple agents working on independent parts simultaneously
    • Hierarchy: lead agent delegates to specialist agents
  3. Role definition — Each agent in a team needs a clear role, capabilities, and boundaries. Unclear roles lead to duplication and conflicts.

  4. Coordination mechanisms — How agents share information: shared files, git branches, structured handoff documents, plan files.

  5. Conflict resolution — When agents disagree (e.g., a reviewer rejects an implementer's code). Patterns for resolving this automatically vs. escalating to the human.

  6. The human's role in agent teams — You become the architect and reviewer of the process, not the executor of each step.

  7. Cost and complexity tradeoffs — Agent teams multiply token usage. When the coordination cost exceeds the benefit, use simpler patterns.

Tool-Specific Content

  • Claude Code: Agent teams via custom agents directory, orchestration scripts, worktree-based isolation
  • Codex CLI: Team patterns using Agents SDK, multi-instance coordination

Exercises

  1. Set up a simple pair: implementer agent + review agent
  2. Build a pipeline: plan → implement → test → review
  3. Run a parallel swarm on a multi-file refactoring task
  4. Measure and compare: team vs. single-agent on the same task

title: "Headless and CI/CD" last_updated: 2026-03-21 status: experimental difficulty: advanced prerequisites: [04-hooks-and-commands, 05-sub-agents]

Headless and CI/CD

Coming in V2. This module is scaffolded with an outline.

Core Question

How do agents run without me?

Module Outline

Concepts

  1. Headless mode — Running AI agents without an interactive terminal. The agent receives instructions programmatically and returns results via structured output (JSON).

  2. Why headless matters — Enables automation: agents can run in CI/CD pipelines, cron jobs, PR review bots, and other automated workflows.

  3. The trust boundary — Interactive sessions have a human in the loop. Headless mode does not. This changes the risk profile and requires stronger guardrails.

  4. CI/CD integration patterns:

    • PR review agent: automatically reviews code on pull request
    • Test generation: agent writes tests for changed code
    • Documentation updates: agent updates docs when code changes
    • Migration assistance: agent helps with version upgrades
  5. Guardrails for automated agents — Permission restrictions, output validation, cost limits, timeout controls, sandbox execution.

  6. Structured output — Using JSON output mode for machine-readable results that can be consumed by other tools in your pipeline.

  7. Monitoring automated agents — Logging, cost tracking, success/failure metrics, alerting on unexpected behavior.

Tool-Specific Content

  • Claude Code: --print flag, -p for headless, JSON output, GitHub Actions integration, SDK usage
  • Codex CLI: Non-interactive mode, automation patterns, scripting

Exercises

  1. Run your first headless command and parse the JSON output
  2. Create a GitHub Action that uses an AI agent to review PRs
  3. Build a cron job that generates a daily code quality report
  4. Set up cost monitoring and alerting for automated agent runs

title: "Orchestration Patterns" last_updated: 2026-03-21 status: experimental difficulty: advanced prerequisites: [08-agent-teams, 09-headless-and-ci-cd]

Orchestration Patterns

Coming in V3. This module is scaffolded with an outline.

Core Question

What are the architecture options for coordinating multiple agents?

Module Outline

Concepts

  1. The orchestration spectrum — From simple sequential tasks to complex multi-agent systems. Most developers need the simpler end; know when to reach for the complex end.

  2. Pattern catalog:

    • Pipeline: Agent A → Agent B → Agent C (sequential, each transforms output)
    • Fan-Out/Fan-In: Parent dispatches N agents, collects results, synthesizes
    • Hierarchy: Lead agent delegates to specialist sub-agents
    • Swarm: Multiple autonomous agents with shared goals, minimal coordination
    • Event-Driven: Agents triggered by external events (webhooks, file changes, schedules)
  3. Choosing the right pattern — Decision framework based on: task independence, coordination needs, error tolerance, cost budget.

  4. Orchestration infrastructure — Scripts, SDKs, and frameworks for building agent orchestration beyond the CLI.

  5. State management — How to maintain shared state across agents: files, databases, structured artifacts.

  6. Error handling at scale — When one agent in a 10-agent pipeline fails: retry, skip, rollback, escalate.

  7. Cost optimization — Strategies for reducing token usage in multi-agent systems without sacrificing quality.

Tool-Specific Content

  • Claude Code: Claude Agent SDK, custom orchestration scripts, headless pipelines
  • Codex CLI: OpenAI Agents SDK, multi-instance patterns

Exercises

  1. Implement each pattern from the catalog on a real project task
  2. Build an orchestration script that coordinates 3+ agents
  3. Add error handling and recovery to a multi-agent workflow
  4. Benchmark: compare patterns on the same complex task

title: "Team Adoption" last_updated: 2026-03-21 status: experimental difficulty: advanced prerequisites: [07-session-architecture]

Team Adoption

Coming in V3. This module is scaffolded with an outline.

Core Question

How does my team adopt agentic workflows?

Module Outline

Concepts

  1. The adoption curve — Most teams have a champion, a few early adopters, and a skeptical majority. Each group needs different things.

  2. Shared configuration — Creating team-wide CLAUDE.md / AGENTS.md files that encode team conventions. The config becomes a living style guide.

  3. The onboarding path — A structured sequence for new team members: install → first task → team config → team workflows.

  4. Standardized workflows — Defining team patterns for common tasks: PR review, bug triage, feature development, documentation updates.

  5. Knowledge sharing — How to capture and share what works: team pattern libraries, internal case studies, retrospectives.

  6. Governance — Who can modify team configs? How do you handle conflicting preferences? What's the review process for new patterns?

  7. Measuring impact — Metrics that matter: cycle time, defect rates, developer satisfaction. Metrics that mislead: lines of code, token usage alone.

  8. Common adoption failures — Mandating tools without training, ignoring security concerns, over-automating too fast, not measuring.

Tool-Specific Content

  • Claude Code: Team settings, shared hooks, org-level CLAUDE.md, enterprise features
  • Codex CLI: Team configuration, shared AGENTS.md, organization patterns

Exercises

  1. Audit your current team's development workflow for agentic opportunities
  2. Create a team CLAUDE.md/AGENTS.md that encodes your team's top 10 conventions
  3. Design an onboarding guide for a new team member's first week with agentic tools
  4. Run a team retrospective after 2 weeks of agentic workflow adoption

title: Fan-Out Fan-In slug: fan-out-fan-in category: workflow-pattern status: proven difficulty: intermediate tags: [parallelism, sub-agents, orchestration, throughput] prerequisites: [basic-cli-usage, git-worktrees] estimated_time: 15min to learn, varies per task cost_per_use: "$0.50-$3.00 depending on sub-task count"

Fan-Out Fan-In

Problem

You have a large task that decomposes into independent pieces — migrating 12 API endpoints, reviewing 8 modules, or generating tests for 15 files. Running them sequentially wastes time and money because the agent idles between context switches. You need a way to dispatch parallel work and merge the results.

Solution

  1. Decompose the task into independent units (files, modules, endpoints).
  2. Dispatch a separate agent (or sub-process) for each unit.
  3. Collect results into a shared location (branch, directory, or summary file).
  4. Merge and review the combined output.

Step-by-Step

  1. Identify the list of independent work items.
  2. Write a dispatch script or use shell parallelism (xargs -P, parallel, or background jobs).
  3. Each sub-agent works in its own worktree or output file.
  4. After all complete, review diffs and consolidate.

When to Use

  • Migrating or refactoring many similar files
  • Generating tests across multiple modules
  • Reviewing a large PR split by directory
  • Bulk documentation generation
  • Any task where sub-items share no dependencies

When NOT to Use

  • Tasks with sequential dependencies (step 2 needs step 1's output)
  • When shared state or files would cause merge conflicts
  • Small tasks where orchestration overhead exceeds the work itself
  • When you need tight consistency across all outputs (use a single agent instead)

Example: Claude Code

# Define the files to process
FILES=(
  src/api/users.ts
  src/api/orders.ts
  src/api/products.ts
  src/api/payments.ts
)

# Fan-out: launch a sub-agent per file in the background
for file in "${FILES[@]}"; do
  claude -p "Write unit tests for $file. Output tests to tests/$(basename $file .ts).test.ts. \
    Follow existing test patterns in the repo. Do not modify the source file." \
    --allowedTools Edit,Read,Bash,Glob,Grep &
done

# Fan-in: wait for all sub-agents to finish
wait
echo "All sub-agents complete."

# Review combined results
git diff --stat
claude -p "Review all new test files in tests/. Check for consistency, \
  missing edge cases, and correct imports. Summarize findings."

Example: Codex CLI

# Fan-out with codex using xargs for parallelism
echo "src/api/users.ts
src/api/orders.ts
src/api/products.ts
src/api/payments.ts" | xargs -P 4 -I {} codex -q \
  "Write unit tests for {}. Save to tests/$(basename {} .ts).test.ts."

# Fan-in: review results
codex -q "Review all files in tests/ for consistency and correctness."

Cost Estimate

Sub-tasksApprox Cost per Sub-taskTotal Estimate
4 files~$0.15-$0.30~$0.60-$1.20
8 files~$0.15-$0.30~$1.20-$2.40
12 files~$0.15-$0.30~$1.80-$3.60

Orchestration overhead (dispatch + review pass) adds ~$0.20-$0.40.

Maturity Notes

Status: Proven. This pattern works well for homogeneous tasks (same operation, different targets). Results vary when sub-tasks are heterogeneous or when outputs must be tightly coordinated. Always include a final review/merge pass — sub-agents may produce inconsistent styles or duplicate helper functions.


title: Plan Then Execute slug: plan-then-execute category: workflow-pattern status: battle-tested difficulty: beginner tags: [planning, plan-mode, structured-approach, reliability] prerequisites: [basic-cli-usage] estimated_time: 5min to learn, immediate use cost_per_use: "$0.05-$0.50"

Plan Then Execute

Problem

Complex tasks fail when the agent starts coding immediately. It picks a suboptimal approach, gets halfway through, realizes the design is wrong, and either backtracks messily or produces tangled code. The larger the task, the more likely this happens. You need the agent to think before it acts.

Solution

Use a two-phase approach: planning with no code changes, then execution of the approved plan.

Step-by-Step

  1. Plan phase: Ask the agent to outline its approach. Specify "do not write any code yet."
  2. Review: Read the plan. Ask clarifying questions. Suggest adjustments.
  3. Approve: Confirm the plan or iterate until it is right.
  4. Execute: Tell the agent to implement the approved plan step by step.
  5. Checkpoint: After each major step, verify progress matches the plan.

When to Use

  • Any task touching more than 2-3 files
  • Architectural changes or new features
  • Refactoring with multiple moving parts
  • When you are unsure of the best approach yourself
  • When the agent has failed on a first attempt

When NOT to Use

  • Simple, well-defined single-file edits
  • Tasks where you already have a precise specification
  • Quick exploratory/throwaway work

Example: Claude Code

# Phase 1: Plan (using plan mode if available)
claude -p "I need to add WebSocket support to our Express API server. \
  Current REST endpoints are in src/routes/. \
  Plan the implementation: which files to create, which to modify, \
  what libraries to use, and in what order. \
  Do NOT write any code yet. Output a numbered step-by-step plan."

# Or use the --plan flag if your version supports it:
# claude --plan "Add WebSocket support to the Express API server."

# Phase 2: Execute the plan
claude -p "Implement the following plan for adding WebSocket support. \
  Work through each step in order. Commit after each major step.

  Plan:
  1. Install ws library (npm install ws)
  2. Create src/websocket/server.ts — WebSocket server setup
  3. Create src/websocket/handlers.ts — message handlers
  4. Modify src/index.ts — attach WS server to HTTP server
  5. Add tests in tests/websocket.test.ts
  6. Update README with WebSocket docs"
# Interactive version with natural back-and-forth
claude

# > Plan how to add WebSocket support to this Express app.
# > Don't write code yet, just outline the approach.
# (review the plan)
# > Good plan, but use Socket.IO instead of raw ws. Update the plan.
# (review again)
# > Approved. Implement it step by step. Commit after each step.

Example: Codex CLI

# Phase 1: Plan
codex -q "Plan how to add WebSocket support to this Express API. \
  List files to create and modify in order. Do not write code yet." \
  > implementation-plan.txt

cat implementation-plan.txt

# Phase 2: Execute
codex -q "Execute this implementation plan step by step:
$(cat implementation-plan.txt)"

Cost Estimate

PhaseTypical Cost
Planning~$0.03-$0.10
Execution~$0.10-$0.80
Total~$0.13-$0.90

Planning adds minimal cost (often under 10% of total) but dramatically reduces wasted execution from wrong approaches.

Maturity Notes

Status: Battle-tested. This is the single most impactful pattern for improving agent reliability on non-trivial tasks. Teams that adopt plan-then-execute report 40-60% fewer "start over" moments. The plan serves double duty as documentation of what was done and why. Works across all agent tools and models.


title: Review Then Fix slug: review-then-fix category: workflow-pattern status: proven difficulty: beginner tags: [code-review, two-pass, quality, bug-fixing] prerequisites: [basic-cli-usage] estimated_time: 10min to learn, varies per task cost_per_use: "$0.10-$0.80"

Review Then Fix

Problem

When you ask an agent to "fix the bugs in this file," it jumps straight to editing without understanding the full picture. It may fix one issue while introducing another, or miss systemic problems because it never stepped back to assess. You need a two-pass approach: understand first, then act.

Solution

Separate the task into two distinct phases with an explicit boundary between them.

Step-by-Step

  1. Pass 1 — Review: Ask the agent to read the code and produce a written list of issues. No edits allowed.
  2. Checkpoint: You read the review. Approve, adjust, or prioritize the findings.
  3. Pass 2 — Fix: Feed the approved issue list back to the agent and ask it to fix each one.
  4. Verify: Run tests or ask for a final diff review.

When to Use

  • Fixing bugs in unfamiliar code
  • Cleaning up code you inherited
  • Addressing PR review comments systematically
  • Security audits or performance reviews
  • Any time the agent's first attempt at a fix was wrong

When NOT to Use

  • Trivial one-line fixes where the problem is obvious
  • When you already have a precise list of changes to make
  • Time-critical hotfixes where speed matters more than thoroughness

Example: Claude Code

# Pass 1: Review only — no edits
claude -p "Review src/auth/login.ts for bugs, security issues, and code smells. \
  Do NOT make any changes. Output a numbered list of issues with line numbers \
  and severity (critical/warning/info)." > review-findings.txt

# Read the findings yourself
cat review-findings.txt

# Pass 2: Fix approved issues
claude -p "Fix the following issues in src/auth/login.ts. Make minimal, \
  targeted changes. After each fix, explain what you changed and why.

  Issues to fix:
  $(cat review-findings.txt)"
# Interactive version (single session, two phases)
claude

# In the session:
# > Review src/auth/login.ts for bugs and security issues.
# > Output a numbered list. Do not edit anything yet.
# (read the list, then:)
# > Fix issues 1, 3, and 5. Skip issues 2 and 4 for now.

Example: Codex CLI

# Pass 1: Review (read-only mode is the default in codex)
codex -q "Review src/auth/login.ts for bugs and security issues. \
  List each issue with its line number and severity." > review-findings.txt

# Pass 2: Fix
codex -q "Fix these issues in src/auth/login.ts:
$(cat review-findings.txt)"

Cost Estimate

PhaseTypical Cost
Review~$0.05-$0.20
Fix~$0.10-$0.50
Total~$0.15-$0.70

The two-pass approach costs ~30% more than a single pass but catches significantly more issues and produces cleaner fixes.

Maturity Notes

Status: Proven. This is one of the most reliable patterns for code quality work. The key insight is that LLMs produce better fixes when they have already articulated the problems in writing. The review phase forces structured reasoning before action. Works best when you actively curate the review findings before the fix phase.


title: Checkpoint Commit slug: checkpoint-commit category: workflow-pattern status: battle-tested difficulty: beginner tags: [git, safety, rollback, long-tasks, risk-management] prerequisites: [basic-cli-usage, basic-git] estimated_time: 5min to learn, immediate use cost_per_use: "$0.00 (workflow habit)"

Checkpoint Commit

Problem

Long agent tasks are risky. The agent works well for 10 steps, then makes a bad decision on step 11 that corrupts files modified in steps 7-10. Without checkpoints, your only options are to manually untangle the damage or start over entirely. You need rollback points so that agent mistakes are cheap to recover from.

Solution

Instruct the agent to commit at natural breakpoints during multi-step tasks. Each commit is a save point you can revert to.

Step-by-Step

  1. Include checkpoint instructions in your prompt or CLAUDE.md.
  2. Define breakpoints: after each file, after each logical step, or after each passing test.
  3. Use descriptive commit messages so you can identify good rollback points.
  4. If something goes wrong: git log --oneline to find the last good checkpoint, then git reset.

When to Use

  • Any task the agent will spend more than 5 minutes on
  • Multi-file refactors
  • Migrations or upgrades
  • Any task where you have said "I wish I could undo that"
  • When running agents in headless/background mode

When NOT to Use

  • Quick single-file edits
  • Exploratory/throwaway work on a scratch branch
  • When you prefer to squash everything into one commit at the end (but still checkpoint on a temp branch)

Example: Claude Code

# Option 1: Instruct in the prompt
claude -p "Migrate all API endpoints from Express to Fastify. \
  Work through one route file at a time. \
  After each file is migrated and its tests pass, make a git commit \
  with the message 'checkpoint: migrate <filename> to Fastify'. \
  Do not move to the next file until the current one is committed."

# Option 2: Add to CLAUDE.md for all tasks
cat >> CLAUDE.md << 'EOF'

## Git Workflow
- During multi-step tasks, commit after each logical step.
- Use the prefix "checkpoint:" for intermediate commits.
- Always run tests before committing.
- Never amend a previous checkpoint commit.
EOF

# Recovery when something goes wrong
git log --oneline -10
# a1b2c3d checkpoint: migrate payments.ts to Fastify   <-- last good
# d4e5f6g checkpoint: migrate orders.ts to Fastify      <-- broken
git reset --soft a1b2c3d   # keep changes staged so you can inspect
git diff --cached           # see what the bad step did
git reset --hard a1b2c3d   # discard the bad changes entirely
# Interactive session with manual checkpoints
claude

# > Refactor the auth module. Start with src/auth/middleware.ts.
# (agent finishes middleware.ts)
# > Commit this as a checkpoint before moving on.
# (agent commits)
# > Now refactor src/auth/tokens.ts.
# (agent makes a mess)
# > Stop. Revert to the last checkpoint and try tokens.ts again
#   with a different approach.

Example: Codex CLI

# Codex with checkpoint instructions
codex -q "Migrate route files in src/routes/ from Express to Fastify. \
  Process one file at a time. After each file, run tests and commit \
  with message 'checkpoint: migrate <file>'."

# Recovery
git log --oneline -10
git reset --hard <last-good-commit>

Cost Estimate

ActivityCost
Checkpoint commits$0.00
Recovery (git reset)$0.00
Re-running failed step~$0.05-$0.20

Checkpoints are free. The cost savings come from not re-running the entire task when only the last step failed.

Maturity Notes

Status: Battle-tested. This is the single most important safety pattern for long-running agent tasks. Every experienced CLI agent user adopts some form of checkpointing. The main pitfall is agents that commit broken code — always include "run tests before committing" in your instructions. Some teams use a dedicated checkpoint/ branch prefix and squash-merge to main when the full task succeeds.


title: Explore Before Change slug: explore-before-change category: workflow-pattern status: battle-tested difficulty: beginner tags: [code-reading, understanding, safety, context-gathering] prerequisites: [basic-cli-usage] estimated_time: 5min to learn, immediate use cost_per_use: "$0.02-$0.10 for exploration phase"

Explore Before Change

Problem

Agents that jump straight into editing code without reading the surrounding context produce fragile, inconsistent changes. They duplicate existing utilities, violate local conventions, break implicit contracts between modules, and miss related code that also needs updating. The cheapest fix is prevention: make the agent read before it writes.

Solution

Always include an explicit exploration phase before any code modification. The agent must understand the existing code, conventions, and dependencies before making changes.

Step-by-Step

  1. Scope the exploration: Tell the agent which files, directories, or patterns to examine.
  2. Read, do not edit: The agent reads relevant files and summarizes what it finds.
  3. Confirm understanding: Review the agent's summary. Correct any misunderstandings.
  4. Proceed to changes: Only then allow the agent to modify code.

When to Use

  • Editing code in an unfamiliar codebase
  • Modifying a module you did not write
  • Any change that touches shared utilities or interfaces
  • First task in a new session (the agent has no memory of previous sessions)
  • When the agent has made incorrect assumptions in the past

When NOT to Use

  • Files you just created in this session (the agent already has full context)
  • Appending to a file where surrounding context is irrelevant
  • Trivial changes like updating a version number

Example: Claude Code

# Explicit two-phase prompt
claude -p "I need to add rate limiting to our API. Before making any changes:

Phase 1 — Explore:
1. Read src/middleware/ to understand existing middleware patterns.
2. Read src/routes/index.ts to see how middleware is applied.
3. Check package.json for any existing rate-limiting libraries.
4. Summarize what you found and propose an approach.

Do NOT edit any files until I confirm your approach."

# After reviewing the summary:
claude -p "Your approach looks good. Implement rate limiting following \
  the patterns you found in the existing middleware."
# Add to CLAUDE.md as a standing rule
cat >> CLAUDE.md << 'EOF'

## Working with Code
- Before modifying any file, first read it completely and read its tests.
- Before adding a utility function, grep the codebase for existing ones.
- Before adding a dependency, check package.json for alternatives already installed.
EOF
# Interactive session with exploration
claude

# > I need to fix the authentication bug in the login flow.
# > First, read through src/auth/ and src/middleware/auth.ts.
# > Tell me how the current auth flow works before changing anything.
# (agent reads and explains)
# > Good. Now read the failing test in tests/auth.test.ts.
# (agent reads and explains the failure)
# > Now fix the bug using what you've learned.

Example: Codex CLI

# Exploration pass (read-only by default in codex)
codex -q "Read src/middleware/ and src/routes/index.ts. \
  Explain how middleware is structured and applied in this project." \
  > exploration-notes.txt

cat exploration-notes.txt

# Change pass
codex -q "Add rate limiting middleware following the patterns described here:
$(cat exploration-notes.txt)"

Cost Estimate

PhaseTypical Cost
Exploration~$0.02-$0.10
Modification~$0.05-$0.40
Total~$0.07-$0.50

Exploration is cheap (mostly input tokens from reading files). It prevents expensive re-runs from bad assumptions.

Maturity Notes

Status: Battle-tested. This is foundational agent hygiene. The pattern is so reliable that many teams encode it directly into their CLAUDE.md as a standing instruction. The main failure mode is agents that skim rather than read — use specific instructions like "read the entire file" or "list all functions in this module" to force thorough exploration. Pairs naturally with Review Then Fix and Plan Then Execute.


title: Progressive Disclosure slug: progressive-disclosure category: workflow-pattern status: proven difficulty: beginner tags: [CLAUDE.md, configuration, iterative, context-management] prerequisites: [basic-cli-usage, claude-md-basics] estimated_time: 10min to learn, ongoing practice cost_per_use: "$0.00 (configuration only)"

Progressive Disclosure

Problem

Writing a massive CLAUDE.md up front is guesswork — you do not know what the agent needs until it fails. A 500-line instruction file wastes context tokens on rules the agent may never need, and you still miss the rules it actually does need. You need a way to build configuration organically from real failures.

Solution

Start with a minimal CLAUDE.md and grow it only when the agent makes a mistake you want to prevent next time.

Step-by-Step

  1. Start minimal: Create a CLAUDE.md with only project basics (language, build commands, test commands).
  2. Work normally: Use the agent for real tasks.
  3. Observe failures: When the agent does something wrong, note the correction you give it.
  4. Promote to CLAUDE.md: If you correct the same thing twice, add a rule to CLAUDE.md.
  5. Prune periodically: Remove rules that are no longer relevant or that the agent consistently follows without prompting.

When to Use

  • Starting a new project with CLI agents
  • Onboarding CLI agents to an existing codebase
  • When your CLAUDE.md has grown stale or bloated
  • When the agent keeps making the same mistake

When NOT to Use

  • You already have a well-tuned CLAUDE.md that works
  • One-off tasks where you will not reuse the configuration
  • Team settings where CLAUDE.md is managed centrally (coordinate first)

Example: Claude Code

# Step 1: Start with a minimal CLAUDE.md
cat > CLAUDE.md << 'EOF'
# Project: my-api

## Stack
- TypeScript, Node.js 20, Express
- PostgreSQL with Prisma ORM
- Jest for testing

## Commands
- Build: `npm run build`
- Test: `npm test`
- Lint: `npm run lint`
EOF

# Step 2: Work normally. The agent makes a mistake:
claude -p "Add a new endpoint for user preferences."
# Agent uses raw SQL instead of Prisma. You correct it in session.

# Step 3: It happens again on the next task. Promote to CLAUDE.md:
cat >> CLAUDE.md << 'EOF'

## Rules
- Always use Prisma ORM for database queries. Never write raw SQL.
EOF

# Step 4: Over time, your CLAUDE.md grows organically:
cat >> CLAUDE.md << 'EOF'
- Use zod for request validation, not manual checks.
- Error responses must use the ApiError class from src/errors.ts.
- New endpoints need tests in tests/routes/ following existing patterns.
EOF
# Audit your CLAUDE.md periodically
claude -p "Review our CLAUDE.md file. For each rule, check if it is \
  still relevant to the current codebase. Flag any rules that reference \
  files, patterns, or libraries that no longer exist."

Example: Codex CLI

# Codex uses AGENTS.md instead of CLAUDE.md — same principle applies
cat > AGENTS.md << 'EOF'
# Project: my-api
- TypeScript, Express, Prisma ORM
- Run tests: npm test
- Always use Prisma for DB access
EOF

# After repeated corrections, append:
echo "- Use zod for validation, not manual checks." >> AGENTS.md

Cost Estimate

ActivityCost
Initial setup$0.00
Each rule addition$0.00
Periodic audit~$0.05

This pattern saves money over time by reducing corrections and re-runs caused by preventable agent mistakes.

Maturity Notes

Status: Proven. This pattern reflects how experienced CLI agent users actually build their configurations. The key discipline is the "two-strike rule" — do not add a rule after a single mistake (it might be a fluke), but always add one after the second occurrence. Over 2-4 weeks of active use, you will build a CLAUDE.md that precisely matches your project's needs.


title: Test-First Agent slug: test-first-agent category: workflow-pattern status: proven difficulty: intermediate tags: [TDD, testing, test-driven, quality, verification] prerequisites: [basic-cli-usage, testing-basics] estimated_time: 10min to learn, varies per task cost_per_use: "$0.15-$1.00"

Test-First Agent

Problem

When an agent writes code and tests together, the tests tend to validate what the code does rather than what it should do. The agent unconsciously writes tests that pass, not tests that verify correctness. If there is a bug in the implementation, the test enshrines it. You need the tests to be an independent specification that the implementation must satisfy.

Solution

Write tests first with the agent, lock them in, then implement code to pass them. The test suite becomes a contract that constrains the implementation.

Step-by-Step

  1. Specify behavior: Describe the desired behavior in plain language.
  2. Generate tests: Ask the agent to write tests based on the specification, with no implementation.
  3. Review tests: Verify the tests capture your intended behavior and edge cases.
  4. Lock tests: Commit the tests. They are now the acceptance criteria.
  5. Implement: Ask the agent to write code that makes all tests pass.
  6. Verify: Run the test suite. If tests fail, the agent fixes the implementation (not the tests).

When to Use

  • Building new features with clear behavioral requirements
  • Replacing or rewriting existing functionality
  • When correctness matters more than speed
  • When you can articulate the expected behavior but not the implementation
  • Fixing bugs (write a failing test first, then fix the code)

When NOT to Use

  • Exploratory prototyping where requirements are unclear
  • UI/visual work where tests are hard to write meaningfully
  • Performance optimization (behavior tests already exist, you need benchmarks)
  • Trivial CRUD with no business logic

Example: Claude Code

# Step 1: Generate tests from a specification
claude -p "Write Jest tests for a new PricingCalculator class with these rules:
  - Base price comes from the product catalog
  - 10% discount for orders over \$100
  - 20% discount for premium members
  - Discounts do not stack (use the higher one)
  - Tax is applied after discounts (rate varies by state)
  - Minimum final price is \$1.00

  Write thorough tests covering all rules and edge cases.
  Create the test file at tests/pricing-calculator.test.ts.
  Do NOT create the implementation file."

# Step 2: Review and commit the tests
cat tests/pricing-calculator.test.ts
git add tests/pricing-calculator.test.ts
git commit -m "test: add PricingCalculator specification tests"

# Step 3: Verify tests fail (no implementation yet)
npm test -- tests/pricing-calculator.test.ts
# Expected: all tests fail

# Step 4: Implement to pass the tests
claude -p "Implement src/pricing-calculator.ts to make all tests in \
  tests/pricing-calculator.test.ts pass. \
  Do NOT modify the test file. \
  Run 'npm test -- tests/pricing-calculator.test.ts' after implementation \
  to verify all tests pass."
# Bug fix workflow: test-first
claude -p "There's a bug: premium members are charged full price on orders \
  over \$100. Write a failing test in tests/pricing-calculator.test.ts that \
  demonstrates this bug. Do NOT fix the implementation yet."

# Verify the test fails
npm test -- tests/pricing-calculator.test.ts

# Now fix
claude -p "Fix src/pricing-calculator.ts so the new failing test passes. \
  Do not modify any tests. All existing tests must continue to pass."

Example: Codex CLI

# Generate tests first
codex -q "Write Jest tests for a PricingCalculator class. Rules:
  - 10% discount over \$100, 20% for premium, no stacking.
  - Tax after discounts, minimum \$1.00.
  Save to tests/pricing-calculator.test.ts. Do NOT create implementation."

# Lock the tests
git add tests/pricing-calculator.test.ts && git commit -m "test: pricing spec"

# Implement
codex -q "Implement src/pricing-calculator.ts to pass all tests in \
  tests/pricing-calculator.test.ts. Do not modify the tests."

Cost Estimate

PhaseTypical Cost
Test generation~$0.05-$0.20
Test review$0.00 (human)
Implementation~$0.10-$0.60
Fix iterations~$0.05-$0.20
Total~$0.20-$1.00

Maturity Notes

Status: Proven. This pattern produces the highest-quality agent-generated code. The main challenge is writing good test specifications — vague specs produce vague tests. Be explicit about edge cases, error conditions, and boundary values. Some practitioners write the test descriptions (test names and comments) by hand and let the agent fill in assertions. Works best with well-established test frameworks where the agent can follow patterns.


title: Session Handoff slug: session-handoff category: workflow-pattern status: proven difficulty: intermediate tags: [continuity, context, sessions, multi-session, handoff] prerequisites: [basic-cli-usage, basic-git] estimated_time: 10min to learn, ongoing practice cost_per_use: "$0.02-$0.10 per handoff"

Session Handoff

Problem

CLI agent sessions are stateless — each new session starts with zero memory of what happened before. When you resume work the next day, the new agent instance does not know what was completed, what was attempted and failed, or what decisions were made. You waste time and tokens re-explaining context, and the agent may redo work or contradict previous decisions.

Solution

End every non-trivial session by writing a structured handoff artifact — a summary commit message, a TODO file, or a status document — that gives the next session complete context to continue seamlessly.

Step-by-Step

  1. Before ending a session: Ask the agent to summarize the current state.
  2. Capture: Write the summary to a known location (commit message, TODO.md, or .session-state file).
  3. Include key details: What was done, what is in progress, what is blocked, what decisions were made and why.
  4. Next session: Start by pointing the agent at the handoff artifact.

When to Use

  • Multi-day tasks that span multiple sessions
  • Handing off work between team members using shared agents
  • Before taking a break from a complex task
  • Any time you think "I need to remember where I left off"
  • Before switching between different tasks in the same repo

When NOT to Use

  • Single-session tasks that complete fully
  • Throwaway exploratory work
  • When the git log already tells the full story

Example: Claude Code

# End-of-session: generate handoff summary
claude -p "We're ending this session. Write a handoff summary to \
  .claude/session-handoff.md with the following sections:
  1. Completed: what was finished this session
  2. In Progress: what is partially done (include file paths and line numbers)
  3. Blocked: anything that needs human input or external action
  4. Decisions: key decisions made and their rationale
  5. Next Steps: ordered list of what to do next
  6. Gotchas: anything surprising or tricky discovered"

# Commit the handoff
git add .claude/session-handoff.md
git commit -m "session: end-of-day handoff — auth refactor 60% complete

Completed: middleware refactor, token validation rewrite.
In progress: session management (src/auth/sessions.ts half done).
Next: finish sessions.ts, then update integration tests.
Blocked: need decision on Redis vs Postgres for session store."
# Start-of-session: load context from handoff
claude -p "Read .claude/session-handoff.md to understand where we left off. \
  Summarize the current state and confirm what you'll work on next. \
  Do not start coding until I confirm the plan."
# Alternative: use a TODO file as the living handoff doc
claude -p "Update TODO.md with the current status of all tasks. \
  Mark completed items with [x]. Add notes on anything tricky."
# Team handoff via commit message
git log --format="%h %s" -5
# a1b2c3d session: auth refactor progress — see commit body for details
git log -1 --format="%B" a1b2c3d
# Shows the full handoff summary in the commit body

Example: Codex CLI

# End-of-session handoff
codex -q "Write a session handoff to .session-state.md covering:
  completed work, in-progress items, decisions made, and next steps."

git add .session-state.md && git commit -m "session: handoff for billing feature"

# Next session start
codex -q "Read .session-state.md. Summarize where we left off and \
  what to work on next."

Cost Estimate

ActivityTypical Cost
Generate handoff~$0.02-$0.08
Load handoff~$0.01-$0.03
Per transition~$0.03-$0.10

Minimal cost that prevents expensive context re-discovery (which can cost $0.20-$0.50 in wasted exploration).

Maturity Notes

Status: Proven. This pattern addresses one of the most common pain points with CLI agents — session discontinuity. The structured format matters: free-form summaries tend to miss critical details. The six-section template (Completed, In Progress, Blocked, Decisions, Next Steps, Gotchas) covers the information a new session actually needs. Some teams automate this by adding a session-end hook or alias. Works especially well combined with Checkpoint Commits.


title: Parallel Worktree slug: parallel-worktree category: workflow-pattern status: experimental difficulty: advanced tags: [git-worktrees, parallelism, isolation, multi-agent, advanced] prerequisites: [basic-cli-usage, git-advanced, fan-out-fan-in] estimated_time: 20min to learn, setup per project cost_per_use: "$0.50-$5.00 depending on parallelism"

Parallel Worktree

Problem

When multiple agent instances work in the same repository simultaneously, they create file conflicts. One agent's uncommitted changes interfere with another's reads. Git branches alone do not solve this because the working directory is shared. You need full filesystem isolation so each agent operates on its own copy of the code without affecting others.

Solution

Use git worktree to create separate working directories for each sub-agent. Each worktree shares the same Git history but has its own independent filesystem, so agents can work in parallel without conflicts.

Step-by-Step

  1. Create worktrees: One per parallel task, each on its own branch.
  2. Dispatch agents: Each agent runs in its own worktree directory.
  3. Work in parallel: Agents read and write files without interfering.
  4. Merge results: Bring branches together via merge or cherry-pick.
  5. Clean up: Remove worktrees when done.

When to Use

  • Running 3+ agents simultaneously on the same repo
  • Tasks that modify overlapping files in different ways
  • Large refactors split by module or feature
  • CI/CD pipelines that run agent tasks in parallel
  • When the Fan-Out Fan-In pattern hits merge conflicts

When NOT to Use

  • Tasks that can run sequentially without time pressure
  • Repos too large to have multiple worktrees on disk
  • When sub-tasks are trivially isolated (separate output files)
  • Teams unfamiliar with git worktrees (learn worktrees first)

Example: Claude Code

#!/bin/bash
# parallel-refactor.sh — Refactor 3 modules in parallel using worktrees

REPO_ROOT=$(git rev-parse --toplevel)
BASE_BRANCH=$(git branch --show-current)
MODULES=("auth" "billing" "notifications")

# Step 1: Create worktrees
for module in "${MODULES[@]}"; do
  git worktree add \
    "${REPO_ROOT}/../worktrees/refactor-${module}" \
    -b "refactor/${module}" \
    "$BASE_BRANCH"
done

# Step 2: Dispatch agents in parallel
for module in "${MODULES[@]}"; do
  WORKTREE="${REPO_ROOT}/../worktrees/refactor-${module}"
  claude -p "Refactor the ${module} module in src/${module}/ to use \
    the new BaseService pattern. Follow the example in src/users/service.ts. \
    Run tests after refactoring. Commit your changes." \
    --cwd "$WORKTREE" &
done

# Step 3: Wait for all agents to finish
wait
echo "All agents complete."

# Step 4: Merge results back
git checkout "$BASE_BRANCH"
for module in "${MODULES[@]}"; do
  echo "Merging refactor/${module}..."
  git merge "refactor/${module}" --no-edit
  if [ $? -ne 0 ]; then
    echo "Conflict merging ${module}. Resolve manually."
    break
  fi
done

# Step 5: Clean up worktrees
for module in "${MODULES[@]}"; do
  git worktree remove "${REPO_ROOT}/../worktrees/refactor-${module}"
  git branch -d "refactor/${module}"
done
# Quick two-worktree setup for a single pair of tasks
git worktree add ../wt-frontend -b task/frontend main
git worktree add ../wt-backend -b task/backend main

# Run agents
claude -p "Update the React components in src/components/." --cwd ../wt-frontend &
claude -p "Update the API handlers in src/api/." --cwd ../wt-backend &
wait

# Merge
git merge task/frontend --no-edit
git merge task/backend --no-edit

# Clean up
git worktree remove ../wt-frontend && git branch -d task/frontend
git worktree remove ../wt-backend && git branch -d task/backend

Example: Codex CLI

# Codex with worktrees
MODULES=("auth" "billing" "notifications")
REPO_ROOT=$(git rev-parse --toplevel)

for module in "${MODULES[@]}"; do
  git worktree add "../wt-${module}" -b "refactor/${module}" main
  codex -q "Refactor src/${module}/ to use BaseService pattern. \
    Commit changes." --cwd "../wt-${module}" &
done

wait

# Merge all branches
for module in "${MODULES[@]}"; do
  git merge "refactor/${module}" --no-edit
  git worktree remove "../wt-${module}"
  git branch -d "refactor/${module}"
done

Cost Estimate

ComponentTypical Cost
Per worktree agent~$0.15-$0.80
3 parallel agents~$0.45-$2.40
Merge review pass~$0.10-$0.30
Total (3 tasks)~$0.55-$2.70

Wall-clock time is divided by the number of parallel agents, but total token cost is the same as sequential execution.

Maturity Notes

Status: Experimental. Git worktrees are a mature Git feature, but orchestrating multiple CLI agents across worktrees is still an emerging practice. Key risks: merge conflicts when modules share interfaces, inconsistent changes across worktrees, and disk space for large repos. Mitigate by choosing truly independent modules and including a final consistency-review pass. The --cwd flag support varies by agent tool version — verify it works in your setup before scripting.


title: Rubber Duck Agent slug: rubber-duck-agent category: workflow-pattern status: proven difficulty: beginner tags: [thinking, design, exploration, rubber-ducking, conversation] prerequisites: [basic-cli-usage] estimated_time: 5min to learn, immediate use cost_per_use: "$0.03-$0.20"

Rubber Duck Agent

Problem

You jump straight into coding a solution before fully understanding the problem. The agent happily helps you implement the first approach you describe, even if it is suboptimal. Hours later, you realize a simpler design existed, or you missed a constraint that invalidates your approach. Traditional rubber-duck debugging uses an inanimate object — but an agent can actually respond, challenge assumptions, and suggest alternatives.

Solution

Before writing any code, use the agent as a thinking partner. Explain the problem, explore the design space, and only implement after you have a clear, validated approach.

Step-by-Step

  1. Explain the problem: Describe what you are trying to achieve, not how.
  2. Explore constraints: Ask the agent to identify edge cases, risks, and tradeoffs.
  3. Consider alternatives: Ask for 2-3 different approaches with pros and cons.
  4. Challenge your assumptions: Tell the agent to poke holes in your preferred approach.
  5. Decide: Pick an approach based on the discussion.
  6. Implement: Now code with confidence.

When to Use

  • Before starting any feature that will take more than an hour to build
  • When you are stuck and cannot see a clear path forward
  • When choosing between multiple technical approaches
  • Debugging a problem you do not understand yet
  • Architecture and design decisions
  • When you want a second opinion before committing to an approach

When NOT to Use

  • You already know exactly what to build and how
  • The task is trivial and does not warrant discussion
  • You need to ship immediately (but even then, 5 minutes of thinking often saves hours)

Example: Claude Code

# Design discussion before implementation
claude

# > I need to add real-time notifications to our app. Users should see
# > notifications in the browser without refreshing. We currently have a
# > REST API with Express and a React frontend. Our scale is ~5000
# > concurrent users.
# >
# > Before we write any code, help me think through this:
# > 1. What are the main approaches (polling, SSE, WebSockets)?
# > 2. What are the tradeoffs at our scale?
# > 3. What infrastructure changes does each require?
# > 4. What's the simplest approach that meets our needs?
# Debugging an issue you don't understand
claude -p "I have a bug I don't understand yet. Don't try to fix it — \
  help me think through it first.

  Symptom: Our API returns 200 OK but the response body is empty \
  about 5% of the time. It happens across all endpoints.

  What I've checked:
  - Logs show the handler completes successfully
  - Database queries return data
  - It happens on all server instances

  Questions:
  1. What could cause a 200 with empty body intermittently?
  2. What should I look at next?
  3. What's the most likely category of bug (networking, middleware, serialization)?"
# Architecture decision record
claude -p "I need to decide between these approaches for our caching layer. \
  Act as a senior engineer reviewing my options. Be critical.

  Option A: Redis cache in front of Postgres
  Option B: Postgres materialized views
  Option C: In-memory cache with LRU eviction

  Context: Read src/db/ and src/api/ to understand our current data access \
  patterns. Then give me a recommendation with reasoning."

Example: Codex CLI

# Design exploration
codex -q "Before implementing anything, help me think through adding \
  real-time notifications to this Express + React app. \
  Compare polling vs SSE vs WebSockets for ~5000 concurrent users. \
  Give pros, cons, and a recommendation."

# Problem analysis
codex -q "I have an intermittent empty response body bug. \
  Read src/middleware/ and src/routes/. \
  What could cause 200 OK with empty body 5% of the time? \
  List hypotheses ranked by likelihood."

Cost Estimate

ActivityTypical Cost
Problem exploration~$0.03-$0.10
Alternatives analysis~$0.05-$0.15
With code reading~$0.08-$0.20

The cheapest pattern in terms of cost, and often the highest ROI. A $0.10 thinking session can prevent a $5.00 wasted implementation.

Maturity Notes

Status: Proven. This is arguably the most underused pattern. Developers instinctively reach for "write code" mode, but the agent is often more valuable as a thinking partner than as a code generator. Key success factors: (1) resist the urge to ask for code too early, (2) explicitly tell the agent not to write code yet, (3) ask it to challenge your assumptions rather than just agree with you. Some practitioners start every session with a 2-minute rubber-duck phase regardless of task complexity.


title: "Anti-Pattern: The God Session" last_updated: 2026-03-21 status: proven difficulty: beginner prerequisites:

  • Basic experience with any agentic coding tool

The God Session

What It Looks Like

You open a single session and ask the agent to refactor the auth module, add tests, update the API docs, fix three bugs, and deploy -- all without ever starting fresh.

Why Developers Do This

It feels efficient. You're "in the zone." Switching sessions seems like overhead. The agent remembers everything you've discussed... right?

Why It Fails

Every agentic tool has a finite context window. As the session grows, the agent starts losing early details. Instructions from prompt #3 get crowded out by prompt #30. The agent begins contradicting its own earlier work, misremembering file names, or silently dropping requirements.

The Symptoms

  • Agent "forgets" changes it made 20 minutes ago
  • Quality of output degrades visibly over time
  • Agent starts hallucinating file names or function signatures
  • You spend more time correcting than coding

What to Do Instead

One session, one focused task. Break large efforts into discrete units.

# Wrong: one mega-session
claude  # then proceed to ask for 15 different things

# Right: scoped sessions
claude "refactor auth module to use JWT"
# finish, review, commit
claude "add unit tests for the JWT auth module"
# finish, review, commit
claude "update API docs to reflect new auth flow"

Use /compact if a session runs long, and lean on CLAUDE.md to carry important context between sessions rather than keeping one session alive forever.


title: "Anti-Pattern: Over-Prompting" last_updated: 2026-03-21 status: proven difficulty: beginner prerequisites:

  • Basic experience with any agentic coding tool

Over-Prompting

What It Looks Like

You write a 300-word prompt explaining exactly how to add a field to a form, describing HTML structure, CSS classes, state management approach, validation logic, and error message wording -- when the agent already knows your framework.

Why Developers Do This

Habits from working with earlier, less capable models. Fear that the agent will get it wrong without exhaustive detail. A belief that more words equals more precision.

Why It Fails

Long prompts burn context tokens on instructions instead of reserving them for code. Worse, conflicting details buried in walls of text confuse the agent. It tries to honor every micro-instruction and produces awkward, over-constrained code that satisfies the letter of your prompt but misses the spirit.

The Symptoms

  • Prompts longer than the code you want generated
  • Agent output feels rigid and over-literal
  • You spend more time writing prompts than reviewing output
  • Small changes require rewriting the entire prompt

What to Do Instead

State the goal and constraints. Let the agent figure out the implementation.

# Wrong
claude "Create a React component called EmailField using a controlled input
with type='email', a useState hook named emailValue initialized to empty
string, an onChange handler that calls setEmailValue with e.target.value,
and a regex validation on blur using /^[^\s@]+@[^\s@]+\.[^\s@]+$/..."

# Right
claude "Add an email input field to the signup form with client-side validation"

If the output misses something, give targeted follow-up feedback rather than front-loading every detail.


title: "Anti-Pattern: Blind Trust" last_updated: 2026-03-21 status: proven difficulty: beginner prerequisites:

  • Basic experience with any agentic coding tool

Blind Trust

What It Looks Like

The agent generates a database migration, you glance at it for two seconds, hit approve, and run it in production. Or the agent refactors a module, all tests pass, and you merge without reading the diff.

Why Developers Do This

The agent's output looks clean and professional. Tests pass. The code compiles. It feels like a waste of time to review what an AI already "thought through." There is also a psychological anchoring effect -- confident-sounding output feels correct.

Why It Fails

Agentic tools are confident but not infallible. They can introduce subtle bugs: off-by-one errors, incorrect null handling, security vulnerabilities, or logic that works for the happy path but breaks on edge cases. They may also delete code they consider unnecessary but that handles an obscure requirement only you know about.

The Symptoms

  • Bugs appearing in code "the agent wrote and tested"
  • Security issues in generated code (SQL injection, missing auth checks)
  • Subtle regressions that pass existing tests but break real workflows
  • Deleted edge-case handling

What to Do Instead

Treat agent output like a junior developer's pull request: review every diff.

# Wrong: auto-approve everything
# (running in full-auto mode for unfamiliar code)

# Right: review the diff before accepting
claude "add rate limiting to the /api/login endpoint"
# Agent proposes changes -> read the diff carefully
# Check: correct middleware? right limits? proper error response?
# Then approve

Use the agent's permission model. Keep auto-approve limited to low-risk operations like reading files. Require manual approval for writes, shell commands, and anything touching auth, payments, or data.


title: "Anti-Pattern: Context Dumping" last_updated: 2026-03-21 status: proven difficulty: beginner prerequisites:

  • Basic experience with any agentic coding tool

Context Dumping

What It Looks Like

You copy-paste an entire 500-line file into your prompt, prefix it with "here's my code," and then ask the agent to fix a bug on line 42. Or you paste full API documentation, config files, and READMEs into the chat "for context."

Why Developers Do This

It feels helpful -- you're giving the agent everything it needs. With chat-based LLMs (non-agentic), you had to paste code in because the model couldn't read files. Old habits carry over.

Why It Fails

Agentic coding tools can read your filesystem directly. When you paste file contents into the prompt, you waste context window tokens on information the agent could fetch on demand. The pasted content also crowds out space the agent needs for reasoning, planning, and generating output. Large paste blocks can even confuse the agent about which version of a file is current.

The Symptoms

  • Hitting context limits earlier than expected
  • Agent referencing your pasted code instead of the actual file on disk
  • Confusion when the pasted version differs from the saved file
  • Needing /compact far too often

What to Do Instead

Point the agent to files. Let it read what it needs.

# Wrong: pasting the file into the prompt
claude "Here's my auth.ts file: [500 lines of code]. Fix the JWT expiry bug."

# Right: reference the file by path
claude "Fix the JWT expiry bug in src/auth/auth.ts around the token
verification logic"

# Even better: let the agent find it
claude "Users report that JWT tokens expire immediately after login.
Diagnose and fix the issue."

Trust the agent's ability to navigate your codebase. Provide file paths or describe the problem, not raw content.


title: "Anti-Pattern: The Redo Loop" last_updated: 2026-03-21 status: proven difficulty: beginner prerequisites:

  • Basic experience with any agentic coding tool

The Redo Loop

What It Looks Like

The agent generates a component. It's not quite right. You say "try again." Still not right. You say "no, redo it but better." Then "make it more like what I described." Five attempts later, you're frustrated, the context window is bloated, and the output is worse than attempt #1.

Why Developers Do This

It mirrors how you might talk to a person: "not that, try again." It feels faster than articulating what specifically is wrong. There's also hope that the next attempt will magically land on the right answer.

Why It Fails

"Try again" gives the agent zero signal about what to change. Each redo consumes context tokens and pushes the original requirements further from the agent's attention. The agent may change things that were already correct, creating a regression spiral.

The Symptoms

  • Output quality decreasing with each attempt
  • Agent changing parts you liked while missing the actual problem
  • Growing frustration on both sides of the prompt
  • Context window filling with failed attempts

What to Do Instead

Give specific, targeted feedback about what is wrong and what "right" looks like.

# Wrong: vague redo requests
"Try again"
"Make it better"
"That's not what I wanted, redo it"

# Right: specific feedback
"The error handling is good, but change the retry logic to use
exponential backoff starting at 100ms with a max of 3 retries.
Keep everything else as-is."

# Even better: point at the exact issue
"In the fetch wrapper on line 15, replace the fixed 1-second delay
with exponential backoff. The rest of the function is correct."

Treat feedback like a code review comment: be specific about what to change and what to preserve.


title: "Anti-Pattern: Empty CLAUDE.md" last_updated: 2026-03-21 status: proven difficulty: beginner prerequisites:

  • Basic experience with Claude Code or similar agentic tool

Empty CLAUDE.md

What It Looks Like

You use Claude Code daily but have never created a CLAUDE.md file. Every session, you re-explain your project structure, coding conventions, test commands, and deployment process. The agent asks the same clarifying questions each time.

Why Developers Do This

It doesn't seem important at first. The agent works fine without it. Writing project documentation feels like overhead. "I'll set it up later" becomes never.

Why It Fails

Without CLAUDE.md (or AGENTS.md for Codex CLI), every session starts from zero. The agent has no memory of your project's conventions, architecture decisions, or preferred patterns. You burn context tokens and time re-establishing basics. Worse, the agent may make inconsistent choices across sessions because it has no stable reference.

The Symptoms

  • Repeating the same instructions every session
  • Agent using different coding styles across sessions
  • Agent running the wrong test command or build tool
  • Inconsistent file organization in generated code

What to Do Instead

Create a CLAUDE.md in your project root with essential project context.

<!-- Wrong: no CLAUDE.md at all, or an empty one -->

<!-- Right: CLAUDE.md with practical project context -->
# Project: Acme API

## Build & Test
- `npm run build` to compile TypeScript
- `npm test` runs Jest; `npm run test:e2e` for integration tests
- Always run tests before suggesting a task is complete

## Conventions
- Use TypeScript strict mode; no `any` types
- Error handling: use Result<T, E> pattern, never throw
- File naming: kebab-case for files, PascalCase for components

## Architecture
- src/routes/ — Express route handlers
- src/services/ — Business logic
- src/db/ — Prisma schema and migrations

Start small. Add conventions as you notice the agent getting things wrong. Your CLAUDE.md will grow organically into the most useful file in the repo.


title: "Anti-Pattern: Micro-Managing" last_updated: 2026-03-21 status: proven difficulty: intermediate prerequisites:

  • Familiarity with agentic tool basics
  • Comfort reading generated code

Micro-Managing

What It Looks Like

Instead of telling the agent what to build, you dictate how to build it line by line. "Create a variable called userList of type Array<User>. Then write a for loop that iterates over rawData. Inside the loop, create a new User object..."

Why Developers Do This

It feels safe. If you control every line, nothing can go wrong. It also comes from experience with less capable tools that needed exact instructions. Some developers simply find it hard to let go of implementation details.

Why It Fails

You're paying for an agent but using it as a typist. Micro-managed prompts are slower than writing the code yourself because you're describing code in English instead of just writing it. You also prevent the agent from applying patterns it knows well -- it might have a cleaner approach, but your step-by-step instructions override that.

The Symptoms

  • Prompts that are longer than the generated code
  • Output that looks exactly like you'd write (so why use the agent?)
  • Missing opportunities for better patterns or idioms
  • Slower than just coding it yourself

What to Do Instead

Describe the goal and constraints. Let the agent choose the implementation.

# Wrong: dictating implementation
claude "Create a function called processUsers that takes a parameter
rawData of type unknown[]. Use a for loop to iterate. Inside the loop
use a type guard to check if each item has an email property..."

# Right: describe the goal
claude "Write a function that takes raw API response data, validates
each item has the required User fields, and returns only valid users.
Use type-safe parsing -- no type assertions."

Set guardrails through your CLAUDE.md (preferred libraries, patterns to avoid) rather than micro-managing each prompt. Intervene on the output, not the process.


title: "Anti-Pattern: Premature Automation" last_updated: 2026-03-21 status: proven difficulty: intermediate prerequisites:

  • Familiarity with agentic tool basics
  • Some experience with CI/CD pipelines

Premature Automation

What It Looks Like

On day one with an agentic coding tool, you build a CI pipeline that triggers headless agent runs on every push, wire up custom slash commands for every workflow, and configure multi-agent orchestration -- before you've completed a single interactive session successfully.

Why Developers Do This

Engineers love automating. The docs mention headless mode, custom commands, and CI integration, so it feels productive to set all of that up immediately. There's also a desire to show the team something impressive.

Why It Fails

Automation amplifies whatever you feed it. If you don't yet understand how to write effective prompts, what the agent handles well, and where it struggles, automation will scale your mistakes. You end up debugging the automation layer instead of learning the tool.

The Symptoms

  • CI pipelines that produce broken PRs faster than you can review them
  • Custom commands that nobody uses because they don't match real workflows
  • Hours spent debugging agent orchestration instead of shipping features
  • Reverting to manual work "until the automation is fixed"

What to Do Instead

Follow a maturity progression: interactive first, then scripted, then automated.

# Wrong: jump straight to headless CI on day one
# .github/workflows/agent-pr.yml with full-auto mode

# Right: build skills incrementally
# Week 1-2: Interactive sessions, learn what works
claude "add input validation to the user signup endpoint"

# Week 3-4: Headless mode for repeatable tasks you understand
claude -p "generate unit tests for src/auth/" --output-file tests.txt

# Month 2+: CI integration for patterns you've validated manually

Master the interactive loop first. Automation is the reward for understanding the tool, not a shortcut around it.


title: "Anti-Pattern: Ignoring the Plan" last_updated: 2026-03-21 status: proven difficulty: intermediate prerequisites:

  • Familiarity with agentic tool basics
  • Experience with multi-file changes

Ignoring the Plan

What It Looks Like

You have a complex task -- migrate a database schema, refactor a module across 15 files, or redesign an API. Instead of asking the agent to plan first, you jump straight to "do it." The agent charges ahead, makes changes across the codebase, and you discover halfway through that the approach is wrong.

Why Developers Do This

Planning feels slow. You want results now. The agent seems smart enough to figure it out on the fly. For simple tasks, skipping planning works fine, which builds a false sense that it always works.

Why It Fails

Without a plan, the agent makes local decisions that may be globally inconsistent. It refactors file A one way, then file B a different way, creating a mess. For multi-step tasks, early wrong turns compound. Undoing 15 file changes is far more painful than reviewing a plan.

The Symptoms

  • Agent making contradictory changes across files
  • Having to revert large batches of changes
  • Discovering a better approach after the agent is halfway done
  • Multi-step tasks that spiral into unexpected complexity

What to Do Instead

For any task touching more than a few files, ask for a plan first.

# Wrong: jump straight to execution
claude "Migrate the user table from MongoDB to PostgreSQL"

# Right: plan first, then execute
claude "Plan the migration of our user data from MongoDB to PostgreSQL.
List every file that needs to change, the order of changes, and any
risks. Don't make changes yet."

# Review the plan, give feedback, then:
claude "Execute the migration plan. Start with step 1: the Prisma schema."

Use plan mode (Shift+Tab in Claude Code) for complex work. It's cheaper to revise a plan than to revert code.


title: "Anti-Pattern: Token Burning" last_updated: 2026-03-21 status: proven difficulty: intermediate prerequisites:

  • Familiarity with agentic tool basics
  • Understanding of context windows

Token Burning

What It Looks Like

You never use /compact, you ask the agent to read entire directories "just in case," you include verbose debug logs in the conversation, and you never check /cost. Halfway through a session the agent starts producing incoherent output and you don't know why.

Why Developers Do This

Context windows feel abstract. Tokens are invisible. The tool doesn't stop working when you run out -- it degrades gradually, making the cause hard to diagnose. If you're on an unlimited plan, cost feels irrelevant (but quality still degrades).

Why It Fails

Every agentic tool has a maximum context window. When you fill it with low-value content -- verbose logs, unnecessary file reads, unfocused conversation -- you push out the high-value content the agent needs: your requirements, the relevant code, and its own reasoning. The result is degraded output quality, not a hard error.

The Symptoms

  • Agent output quality dropping mid-session with no clear cause
  • /cost showing unexpectedly high usage
  • Agent "forgetting" earlier instructions or context
  • Sessions that feel sluggish or produce repetitive responses

What to Do Instead

Manage your context window like a scarce resource.

# Wrong: never managing context
claude  # start session
# ... 50 prompts later, quality is terrible, context is full

# Right: active context management
/compact          # summarize and reclaim context regularly
/cost             # check token usage periodically
/clear            # start fresh when switching tasks

# Be selective about what the agent reads
claude "Check only src/auth/token.ts for the expiry bug"
# not
claude "Read the entire src/ directory and find any bugs"

Use /compact proactively after completing each sub-task, not just when things break. Check /cost periodically. Start new sessions for unrelated tasks.


title: "Claude Code Cheatsheet" last_updated: 2026-03-21 tested_with: claude-code: "1.0.x" status: proven difficulty: beginner

Claude Code Cheatsheet

Installation

npm install -g @anthropic-ai/claude-code

Essential Commands

CommandDescription
claudeStart interactive session in current directory
claude "prompt"Start session with an initial prompt
claude --continueResume the most recent conversation
claude --resumePick a previous conversation to resume
claude -p "prompt"Headless mode -- run a single prompt, print result, exit
claude -p "prompt" --output-file out.txtHeadless mode with output saved to file
claude configOpen configuration settings

Slash Commands

CommandDescription
/helpShow all available commands
/clearClear conversation history and start fresh
/compactSummarize conversation to reclaim context window
/modelSwitch to a different model mid-session
/statusShow current session info (model, project, etc.)
/costDisplay token usage and cost for the current session

Mode Switching

ShortcutDescription
Shift+TabToggle between plan mode and code mode

Plan mode lets the agent reason and propose a plan without making changes. Code mode (default) allows the agent to read and write files.

Key Config Files

FilePurpose
CLAUDE.mdProject memory -- conventions, build commands, architecture notes. Loaded automatically at session start.
~/.claude/CLAUDE.mdUser-level memory applied to all projects.
.claude/settings.jsonProject-level settings (allowed/denied tools, permissions).
~/.claude/settings.jsonUser-level settings.
.claude/commands/*.mdCustom slash commands scoped to the project.
~/.claude/commands/*.mdCustom slash commands available globally.

Permission Model

Claude Code asks for approval before performing potentially risky actions.

CategoryExamplesDefault
ReadReading files, listing directoriesAllowed
WriteCreating/editing filesRequires approval
ShellRunning shell commandsRequires approval

Grant standing permissions via .claude/settings.json:

{
  "permissions": {
    "allow": [
      "Edit",
      "Bash(npm test)",
      "Bash(npm run build)"
    ],
    "deny": [
      "Bash(rm -rf *)"
    ]
  }
}

Keyboard Shortcuts

ShortcutAction
EnterSend message
Shift+TabToggle plan/code mode
EscapeCancel current generation
Ctrl+CCancel or exit
Up arrowCycle through previous prompts

Common Flags

FlagDescription
-p, --printHeadless mode (non-interactive)
--continueResume last conversation
--resumePick a conversation to resume
--output-fileWrite headless output to a file
--modelSpecify model to use
--verboseEnable verbose logging
--allowedToolsSpecify which tools the agent can use
--max-turnsLimit number of agentic turns in headless mode

Environment Variables

VariablePurpose
ANTHROPIC_API_KEYAPI key for direct Anthropic access
CLAUDE_CODE_USE_BEDROCKSet to 1 to use AWS Bedrock as provider
CLAUDE_CODE_USE_VERTEXSet to 1 to use Google Vertex AI as provider
AWS_REGIONAWS region for Bedrock
AWS_PROFILEAWS profile for Bedrock authentication
CLOUD_ML_REGIONGoogle Cloud region for Vertex AI
ANTHROPIC_MODELOverride the default model
DISABLE_PROMPT_CACHINGSet to 1 to disable prompt caching

Quick Tips

  • Run /compact after finishing each sub-task to keep context clean.
  • Put build/test commands in CLAUDE.md so the agent runs them correctly every time.
  • Use claude -p in scripts and CI pipelines for non-interactive use.
  • Start with strict permissions and relax them as you build trust.

title: "Codex CLI Cheatsheet" last_updated: 2026-03-21 tested_with: codex-cli: "0.2.x" status: proven difficulty: beginner

Codex CLI Cheatsheet

Installation

npm install -g @openai/codex

Essential Commands

CommandDescription
codexStart interactive session in current directory
codex "prompt"Start session with an initial prompt
codex --approval-mode suggestAgent suggests commands but never executes (safest)
codex --approval-mode auto-editAgent can edit files, but shell commands need approval
codex --approval-mode full-autoAgent runs everything without asking (use with caution)

Approval Modes

ModeFile ReadsFile WritesShell Commands
suggest (default)AllowedNeeds approvalNeeds approval
auto-editAllowedAllowedNeeds approval
full-autoAllowedAllowedAllowed (sandboxed)

Key Config Files

FilePurpose
AGENTS.mdProject memory -- equivalent to Claude Code's CLAUDE.md. Conventions, build commands, architecture notes.
~/.codex/config.yamlUser-level configuration (default model, approval mode, etc.)
~/.codex/instructions.mdUser-level instructions applied to all projects

The Sandbox Model

Codex CLI uses network-disabled sandboxing for command execution, especially in full-auto mode:

  • Network access is disabled by default for all shell commands.
  • Commands run in a sandboxed environment using platform-specific isolation (macOS Seatbelt, Linux namespaces).
  • File writes outside the project directory are blocked in the sandbox.
  • The sandbox protects against accidental damage but is not a security boundary against adversarial prompts.

This means full-auto is safer than it sounds -- but you should still review diffs before committing.

Common Flags

FlagDescription
--approval-modeSet approval mode: suggest, auto-edit, full-auto
--modelSpecify model (default: o4-mini)
--quietSuppress non-essential output
--notifySend desktop notification when task completes
--no-project-docSkip loading AGENTS.md

Environment Variables

VariablePurpose
OPENAI_API_KEYAPI key for OpenAI access (required)
OPENAI_BASE_URLCustom API endpoint for proxies or compatible providers
CODEX_HOMEOverride default config directory (~/.codex)

Key Differences from Claude Code

AspectClaude CodeCodex CLI
ProviderAnthropic (Claude)OpenAI (o4-mini, o3, etc.)
Project memoryCLAUDE.mdAGENTS.md
Permission modelApproval per action type with allowlistsThree approval modes (suggest/auto-edit/full-auto)
SandboxNo built-in sandbox; relies on approval promptsNetwork-disabled sandbox for shell commands
Headless modeclaude -p "prompt"Not a primary workflow (interactive-first)
Context management/compact, /clear slash commandsAutomatic context management
Plan modeBuilt-in (Shift+Tab)Not a separate mode
Default modelClaude Sonneto4-mini
Config formatJSON (.claude/settings.json)YAML (config.yaml)

Quick Tips

  • Start with suggest mode until you trust the tool with your codebase.
  • Write an AGENTS.md with build commands and conventions, just like CLAUDE.md.
  • The sandbox blocks network access, so commands needing the internet (e.g., npm install) will fail in full-auto. Approve those manually.
  • Use --notify for long-running tasks so you can context-switch.
  • Codex CLI works well for focused, single-task sessions. Keep prompts concise.

title: "Starter CLAUDE.md Template" tested_with: claude-code: "1.0.x" last_updated: 2026-03-21 status: battle-tested difficulty: beginner prerequisites: []

Starter CLAUDE.md Template

What This Config Does

A minimal but effective CLAUDE.md that gives Claude Code essential context about your project. This is the "first 5 minutes" config — enough to see immediate improvement in agent output quality.

The Config

# Project: [Your Project Name]

[One sentence describing what this project does.]

## Tech Stack

- Language: [e.g., TypeScript]
- Framework: [e.g., Next.js 14]
- Database: [e.g., PostgreSQL with Prisma ORM]
- Testing: [e.g., Vitest]

## Key Conventions

- [Your most important convention, e.g., "Use functional components with hooks, not class components"]
- [Your second most important convention, e.g., "All API routes go in src/app/api/"]
- [Your third most important convention, e.g., "Run `npm test` before committing"]

## Common Commands

- Build: `npm run build`
- Test: `npm test`
- Dev server: `npm run dev`
# Adapt this by:
# - Replace all bracketed placeholders with your project's actual details
# - Start with 3-5 conventions max — add more over time as you notice repeated corrections
# - Keep the whole file under 50 lines to start

Where to Put It

Place at the root of your project directory as CLAUDE.md. Claude Code automatically reads it at the start of every session.

How to Verify It Works

After creating the file, start a new Claude Code session and ask:

What are the key conventions for this project?

The agent should recite your conventions back to you. If it doesn't mention them, check that the file is named exactly CLAUDE.md (case-sensitive) and is in the directory where you launched Claude Code.

Notes

  • This is a starting point. See Module 02: Project Memory for the full progression from starter to advanced CLAUDE.md.
  • Don't try to document everything upfront. Add conventions as you notice the agent making the same mistake twice.
  • Keep it concise — CLAUDE.md is loaded into every session's context window.

title: "Starter AGENTS.md Template" tested_with: codex-cli: "0.2.x" last_updated: 2026-03-21 status: proven difficulty: beginner prerequisites: []

Starter AGENTS.md Template

What This Config Does

A minimal AGENTS.md that gives Codex CLI (and other tools supporting the open standard) essential context about your project.

The Config

# Project: [Your Project Name]

[One sentence describing what this project does.]

## Setup

- Language: [e.g., Python 3.12]
- Package manager: [e.g., pip with requirements.txt]
- Test runner: [e.g., pytest]

## Conventions

- [e.g., "Use type hints on all function signatures"]
- [e.g., "Follow PEP 8 formatting"]
- [e.g., "Tests go in tests/ mirroring the src/ structure"]

## Commands

- Install: `pip install -r requirements.txt`
- Test: `pytest`
- Lint: `ruff check .`
# Adapt this by:
# - Replace placeholders with your project specifics
# - AGENTS.md follows the same principles as CLAUDE.md — start minimal, grow over time
# - If you use both Claude Code and Codex CLI, you can have both CLAUDE.md and AGENTS.md in the same repo

Where to Put It

Place at the root of your project directory as AGENTS.md. Codex CLI reads it automatically.

How to Verify It Works

Start a new Codex session and ask:

What testing framework does this project use?

It should reference the details from your AGENTS.md.

Notes

  • AGENTS.md is an open standard supported by 60K+ projects and stewarded by the Linux Foundation
  • Both CLAUDE.md and AGENTS.md can coexist — each tool reads its own config file
  • See Module 02: Project Memory for advanced configuration patterns

title: "Intermediate CLAUDE.md Template" tested_with: claude-code: "1.0.x" last_updated: 2026-03-21 status: proven difficulty: intermediate prerequisites: [02-project-memory]

Intermediate CLAUDE.md Template

What This Config Does

A more detailed CLAUDE.md for teams or complex projects. Includes architecture context, testing standards, and common pitfalls. Use this after you've outgrown the starter template.

The Config

# [Project Name]

[2-3 sentence description of what this project does and who it serves.]

## Architecture

- `src/api/` — REST API endpoints (Express)
- `src/services/` — Business logic (no framework dependencies)
- `src/models/` — Database models (Prisma)
- `src/utils/` — Shared utilities
- `tests/` — Mirrors src/ structure

## Tech Stack

- Runtime: Node.js 20 with TypeScript 5.x
- Framework: Express 4.x
- Database: PostgreSQL 16 with Prisma ORM
- Testing: Vitest + Supertest for API tests
- CI: GitHub Actions

## Conventions

- Use `async/await`, never raw Promises or callbacks
- All functions must have TypeScript return types
- Business logic goes in `src/services/`, not in route handlers
- Route handlers only do: parse request → call service → format response
- Use Zod for runtime input validation on all API endpoints
- Error responses follow RFC 7807 (Problem Details)

## Testing Standards

- Every new function needs a unit test
- Every new API endpoint needs an integration test
- Run `npm test` before suggesting changes are complete
- Test file naming: `foo.test.ts` next to `foo.ts`

## Common Commands

- Dev: `npm run dev` (starts on port 3000)
- Test: `npm test`
- Test watch: `npm run test:watch`
- Build: `npm run build`
- Lint: `npm run lint`
- DB migrate: `npx prisma migrate dev`

## Do NOT

- Do not use `any` type — use `unknown` and narrow
- Do not put business logic in route handlers
- Do not use string concatenation for SQL — always use Prisma
- Do not skip writing tests for new code
# Adapt this by:
# - Replace all specifics with your project's details
# - The "Do NOT" section is high-value — add rules for mistakes that cost you time
# - Keep under 80 lines — if it's longer, consider directory-level CLAUDE.md files

Where to Put It

Project root as CLAUDE.md. For large monorepos, also create CLAUDE.md files in subdirectories for directory-specific instructions.

How to Verify It Works

Ask the agent to implement a new API endpoint. It should:

  • Place the route in src/api/
  • Put business logic in src/services/
  • Add Zod validation
  • Create both unit and integration tests
  • Use async/await (not callbacks)

If it doesn't follow these conventions, your CLAUDE.md needs adjustment.

Notes

  • This is the "sweet spot" level for most projects. See Module 02 for when to go beyond this.
  • Review and update quarterly. Stale CLAUDE.md files teach the agent outdated patterns.

title: "Pre-Commit Lint Hook for Claude Code" tested_with: claude-code: "1.0.x" last_updated: 2026-03-21 status: proven difficulty: intermediate prerequisites: [04-hooks-and-commands]

Pre-Commit Lint Hook

What This Config Does

Automatically runs your project's linter after Claude Code edits a file. If the lint fails, the agent sees the errors and can fix them in the same session.

The Config

Add to your .claude/settings.json (project-level):

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "hook": "npx eslint --fix $(echo $TOOL_OUTPUT | jq -r '.filePath // empty') 2>&1 || true"
      }
    ]
  }
}
# Adapt this by:
# - Replace `npx eslint --fix` with your project's linter command
# - For Python: `ruff check --fix $FILE`
# - For Go: `gofmt -w $FILE`
# - For Rust: `cargo fmt -- $FILE`
# - The `|| true` prevents the hook from blocking the agent on lint errors

Where to Put It

.claude/settings.json in your project root for project-level, or ~/.claude/settings.json for global.

How to Verify It Works

Ask Claude Code to create a file with intentional style issues (e.g., missing semicolons in JS). After the file is written, the hook should auto-fix and you'll see the lint output in the session.

Notes

  • This runs after EVERY file edit, which adds a small delay per edit. If your linter is slow, consider running it only on specific file types.
  • The hook output is visible to the agent, so it can react to lint failures.
  • See Module 04: Hooks & Commands for more hook patterns.

title: "Session Cost Tracking Hook" tested_with: claude-code: "1.0.x" last_updated: 2026-03-21 status: experimental difficulty: intermediate prerequisites: [04-hooks-and-commands]

Session Cost Tracking Hook

What This Config Does

Logs a notification when a session ends, reminding you to check token usage. Useful for building cost awareness without disrupting workflow.

The Config

Add to your .claude/settings.json:

{
  "hooks": {
    "Stop": [
      {
        "hook": "echo '💰 Session ended. Run /cost in your next session to review token usage.'"
      }
    ]
  }
}
# Adapt this by:
# - Replace the echo with a script that logs to a file for tracking over time
# - Example: echo \"$(date),session_end\" >> ~/.claude/cost-log.csv

Where to Put It

~/.claude/settings.json (user-level) so it applies to all projects.

How to Verify It Works

End a Claude Code session. You should see the cost reminder message.

Notes

  • This is a simple awareness hook. For automated cost tracking, see Module 09: Headless & CI/CD.
  • Token costs vary significantly by task complexity. Typical ranges: simple tasks $0.01-0.05, complex features $0.10-0.50, large refactors $0.50-2.00.

title: "Bug Fix Prompt Template" tested_with: claude-code: "1.0.x" codex-cli: "0.2.x" last_updated: 2026-03-21 status: proven difficulty: beginner prerequisites: []

Bug Fix Prompt Template

When to Use This Prompt

When you've identified a bug and want the agent to investigate and fix it.

The Prompt

There's a bug where [describe the symptom — what happens vs. what should happen].

The relevant code is likely in [file or directory]. [Optional: describe any reproduction steps.]

Please:
1. Read the relevant code and identify the root cause
2. Fix the bug
3. Add a test that would have caught this bug

# Adapt this by:
# - Replace [symptom] with what you actually observe
# - Replace [file or directory] with where you think the issue is (or omit if unsure)
# - Add reproduction steps if you have them
# - Remove the test step if tests aren't set up yet

Why It Works

  • Symptom-first: Tells the agent what's wrong without prescribing the fix
  • Scoped: Points to the likely location, reducing search time
  • Three-step structure: Read → Fix → Test prevents the agent from jumping to conclusions
  • Test requirement: Ensures the fix is verified and prevents regression

Variations

When you don't know where the bug is:

Users report that [symptom]. I don't know where the relevant code is.
Search the codebase for code related to [feature area] and identify the root cause.

When you know the exact cause:

In [file:line], [describe the incorrect behavior]. This should instead [describe correct behavior].
Fix it and update any tests affected by the change.

Example Output

The agent should: read the file(s), explain what it found, show the fix, and write a test. If it jumps straight to editing without reading first, that's a sign to use the explore-before-change pattern.


title: "Feature Addition Prompt Template" tested_with: claude-code: "1.0.x" codex-cli: "0.2.x" last_updated: 2026-03-21 status: proven difficulty: beginner prerequisites: []

Feature Addition Prompt Template

When to Use This Prompt

When adding a new feature or capability to an existing codebase.

The Prompt

Add [feature description] to [component/area].

Requirements:
- [Requirement 1]
- [Requirement 2]
- [Requirement 3]

Follow the pattern used in [existing similar feature] for consistency.

# Adapt this by:
# - Be specific about requirements — vague features get vague implementations
# - Reference an existing feature for the agent to follow as a pattern
# - If the feature is complex, consider using plan mode first

Why It Works

  • Clear scope: Names the feature and where it goes
  • Explicit requirements: Prevents the agent from guessing what you want
  • Pattern reference: Points to existing code as a template, ensuring consistency with your codebase

Variations

For complex features (use with plan mode):

I need to add [feature]. Before implementing, create a plan:
1. What files need to change?
2. What's the approach?
3. Are there any edge cases to handle?

Show me the plan before making changes.

For features with UI:

Add [feature] to [page/component]. It should:
- [Visual behavior]
- [User interaction]
- [Data flow]

Match the style of [existing similar component].

Example Output

The agent should: identify relevant files, implement the feature following the referenced pattern, and add tests. If it creates a wildly different structure from your existing code, your CLAUDE.md may need stronger convention guidance.


title: "Code Review Prompt Template" tested_with: claude-code: "1.0.x" codex-cli: "0.2.x" last_updated: 2026-03-21 status: proven difficulty: beginner prerequisites: []

Code Review Prompt Template

When to Use This Prompt

When you want the agent to review code for quality, bugs, security, or adherence to conventions.

The Prompt

Review [file or directory] for:
- Bugs or logic errors
- Security issues
- Performance concerns
- Adherence to our project conventions

For each issue found, explain:
1. What the problem is
2. Why it matters
3. How to fix it

Don't fix anything yet — just report findings.

# Adapt this by:
# - Narrow the focus: "Review for SQL injection vulnerabilities" is better than "review everything"
# - Replace the checklist with your team's specific review criteria
# - Add "Don't fix anything yet" if you want review-only; remove it if you want auto-fix

Why It Works

  • Structured output: The 3-part format (what, why, how) gives actionable findings
  • Review-only mode: Separating review from fixing prevents premature changes
  • Scoped criteria: Telling the agent what to look for prevents generic platitudes

Variations

Review recent changes:

Review the changes in the current git diff. Focus on correctness and whether the changes match the intent described in the most recent commit message.

Security-focused review:

Perform a security review of [file]. Check for: input validation, injection vulnerabilities, authentication/authorization issues, and sensitive data handling.

Review then fix:

Review [file] for issues. Present your findings. After I approve, fix the issues you found.

Example Output

Good output lists specific findings with line numbers and concrete fixes. If the agent returns only vague praise ("the code looks well-structured"), the scope was too broad — narrow your review criteria.


title: "Refactoring Prompt Template" tested_with: claude-code: "1.0.x" codex-cli: "0.2.x" last_updated: 2026-03-21 status: proven difficulty: intermediate prerequisites: [03-prompting-for-agents]

Refactoring Prompt Template

When to Use This Prompt

When restructuring existing code without changing its external behavior.

The Prompt

Refactor [file or function] to [goal — e.g., "reduce duplication", "improve readability", "extract into separate module"].

Constraints:
- Don't change the public API / function signatures
- Keep all existing tests passing
- Run tests after refactoring to verify

# Adapt this by:
# - Be specific about the goal — "refactor" alone is too vague
# - The constraint about public API prevents breaking callers
# - Always include the test verification step

Why It Works

  • Goal-oriented: Names exactly what improvement to make
  • Constrained: Explicitly preserves external behavior
  • Verified: Tests confirm nothing broke

Variations

Extract pattern:

The logic in [file] from lines [X] to [Y] is duplicated in [other file]. Extract it into a shared utility in [target location] and update both callers.

Simplification:

[function] is hard to read because [reason — e.g., "deeply nested conditions", "too many parameters"]. Simplify it without changing behavior. Run tests after.

Module split:

[file] has grown too large ([X] lines). Split it into focused modules:
- [module A] for [responsibility]
- [module B] for [responsibility]
Keep the public exports the same so callers don't need to change.

Example Output

The agent should: read the code, explain its refactoring plan, make changes, and run tests. If tests fail after refactoring, the agent should fix the issues — not skip the tests.


title: "Codebase Exploration Prompt Template" tested_with: claude-code: "1.0.x" codex-cli: "0.2.x" last_updated: 2026-03-21 status: proven difficulty: beginner prerequisites: []

Codebase Exploration Prompt Template

When to Use This Prompt

When you need to understand unfamiliar code, onboard to a new project, or map out a feature's implementation.

The Prompt

Explore this codebase and explain:
1. What the project does (one paragraph)
2. The key directories and what each contains
3. The main entry point(s) and how a request flows through the system
4. The 5 most important files and why they matter

# Adapt this by:
# - Narrow to a specific area: "Explore the authentication system"
# - Add specific questions: "How does the billing module calculate charges?"
# - Ask for a specific output format if needed

Why It Works

  • Structured exploration: The numbered format prevents rambling summaries
  • Progressive depth: Overview → structure → flow → specifics
  • Actionable output: "The 5 most important files" gives you a reading list

Variations

Feature tracing:

Trace how [feature — e.g., "user login"] works from the UI to the database. Show the call chain with file names and function names.

Dependency mapping:

What external dependencies does [module/file] rely on? For each, explain what it's used for and whether there are alternatives.

Architecture decision investigation:

Why is [thing] implemented as [pattern]? Look at git history and code comments for context. Is this deliberate architecture or accumulated debt?

Example Output

Good output gives you a mental map you can navigate by. If the agent just lists every file, the scope was too broad — ask about a specific subsystem.


title: "Agent Ignores Project Conventions" last_updated: 2026-03-21 status: proven difficulty: beginner prerequisites: [02-project-memory]

Problem: Agent Ignores Project Conventions

Symptoms

  • Agent uses var when you require const/let
  • Agent creates files in wrong directories
  • Agent uses a different testing framework than your project
  • Agent doesn't follow your naming conventions

Common Causes

  1. No CLAUDE.md / AGENTS.md — The agent has no way to know your conventions without project memory.
  2. Conventions are too vague — "Follow best practices" means nothing. "Use arrow functions, not function declarations" is actionable.
  3. CLAUDE.md is too long — If your config file is 200+ lines, important instructions get diluted.
  4. Convention conflicts with codebase — If your existing code doesn't follow the conventions you wrote, the agent may follow the code patterns instead.

Solutions

For Cause 1: Create Project Memory

Create a CLAUDE.md or AGENTS.md with your conventions. See Starter Template.

For Cause 2: Be Specific

Replace vague instructions with precise ones:

# Bad
- Follow good coding practices

# Good
- Use `const` by default, `let` only when reassignment is needed, never `var`
- Name files in kebab-case: `user-profile.ts`, not `userProfile.ts`

For Cause 3: Trim Your Config

Keep CLAUDE.md under 80 lines. Move detailed documentation elsewhere and keep only actionable instructions in the config.

For Cause 4: Fix the Codebase First

If your existing code uses inconsistent patterns, fix the most visible files first. The agent learns from what it reads.

Prevention

  • Start with the Starter CLAUDE.md on day one
  • Add one convention each time you have to correct the agent manually
  • Use the "Convention Test" exercise from Module 02

title: "Agent Goes Off Track on Complex Tasks" last_updated: 2026-03-21 status: proven difficulty: intermediate prerequisites: [03-prompting-for-agents]

Problem: Agent Goes Off Track on Complex Tasks

Symptoms

  • Agent starts solving a different problem than what you asked
  • Agent makes increasingly wrong assumptions as the session continues
  • Changes pile up that don't relate to the original task
  • Agent edits files you didn't expect it to touch

Common Causes

  1. Task too large — The task exceeds what the agent can hold in focus. Context drifts.
  2. Ambiguous prompt — The agent interpreted your request differently than you intended.
  3. Context window full — As the session grows, earlier instructions get compressed or lost.

Solutions

For Cause 1: Decompose the Task

Break it into smaller, focused subtasks:

# Instead of:
"Refactor the entire authentication system"

# Do:
"Refactor the login function in src/auth/login.ts to use async/await instead of callbacks. Don't change anything else."

For Cause 2: Use Plan Mode

Ask the agent to plan before executing:

Before making changes, explain your plan:
1. What files will you modify?
2. What approach will you take?
3. What won't you change?

Review the plan. If it's off track, correct it before any code changes happen.

For Cause 3: Start a Fresh Session

Use checkpoint commits to save progress, then start a new session:

git add -A && git commit -m "WIP: checkpoint before next phase"
claude  # fresh session with full context window

Prevention

  • Use the plan-then-execute pattern for any task touching 3+ files
  • Use checkpoint commits every 15-20 minutes on complex tasks
  • Keep individual prompts focused on one change at a time
  • If the session feels "heavy," start fresh — it's cheaper than correcting drift

title: "Unexpectedly High Token Usage" last_updated: 2026-03-21 status: proven difficulty: intermediate prerequisites: [03-prompting-for-agents]

Problem: Unexpectedly High Token Usage

Symptoms

  • API costs are higher than expected
  • Sessions feel slow and expensive
  • The agent reads many files you didn't ask about
  • Simple tasks consume a surprising number of tokens

Common Causes

  1. Large context window — CLAUDE.md is too long, or the agent is reading entire large files when it only needs a section.
  2. The redo loop — You keep asking the agent to redo the same task with slightly different prompts instead of giving targeted corrections.
  3. No use of /compact — Long sessions accumulate context that's no longer relevant.
  4. Unnecessary exploration — The agent searches broadly when you could point it to the specific file.

Solutions

For Cause 1: Trim Context

# Keep CLAUDE.md concise — under 80 lines
# Point the agent to specific files instead of letting it search:

"Fix the bug in src/auth/login.ts on line 42"
# instead of
"Fix the login bug"

For Cause 2: Give Targeted Feedback

# Instead of:
"No, try again"

# Do:
"The function signature is correct, but the error handling on line 15 should use a try/catch instead of .catch(). Keep everything else."

For Cause 3: Use /compact

In Claude Code, run /compact when:

  • A session has been running for 20+ minutes
  • You're switching to a different subtask
  • The agent starts referencing outdated context

For Cause 4: Be Specific About Location

# Expensive (agent searches everything):
"Find and fix the performance issue"

# Cheaper (agent goes straight to the file):
"The performance issue is in src/api/search.ts — the database query on line 89 is missing an index. Fix it."

Prevention

  • Check /cost periodically during long sessions
  • Use the checkpoint-commit pattern to start fresh sessions
  • Point the agent to specific files when you know the location
  • See the token-burning anti-pattern for more strategies

title: "Practice Project: Agent-Assisted CLI Todo App" last_updated: 2026-03-21 status: proven difficulty: beginner prerequisites: [01-your-first-hour, 02-project-memory] estimated_time: "90 minutes"

Practice Project: Agent-Assisted CLI Todo App

Overview

You are going to build a command-line todo application from scratch, entirely through collaboration with an AI coding agent. The todo app itself is not the point. The point is practicing the agentic development skills you learned in Modules 00 through 02 — giving clear instructions, reviewing output critically, using project memory, and iterating effectively.

By the end of this project, you will have:

  • Scaffolded a project by giving an agent clear architectural direction
  • Built features using the plan-then-execute pattern
  • Created and tested a project memory file that measurably improves agent output
  • Used the test-first pattern to drive implementation through tests
  • Reflected on what the agent did well and where it needed correction

This project takes about 90 minutes. Each phase has a time estimate, but do not rush — the reflection at the end is where the learning compounds.

What You Will Build

A CLI todo app that supports adding tasks, listing them, marking them complete, and deleting them. The app stores todos in memory for now (persistence is a bonus challenge). It runs from the terminal and accepts commands as arguments.

What You Will Practice

SkillWhere Practiced
Clear architectural promptingPhase 1
Plan-then-execute patternPhase 2
Critical output reviewPhase 2
Project memory creation and testingPhase 3
Test-first agent patternPhase 4
Structured reflectionPhase 5

Prerequisites

Before starting, confirm all of the following:

  • Your agentic coding tool is installed and working (claude or codex)
  • Your API key is configured and you have verified it works by running a simple prompt
  • You have completed Module 01: Your First Hour and Module 02: Project Memory
  • You have a language runtime installed for your language of choice (Node.js, Python, Go, Rust — any works)
  • Git is installed and available on your system

Cost: This project will consume roughly $2-6 in API tokens depending on your language choice, how many iterations you need, and whether you attempt the bonus challenges.


Phase 1: Setup with the Agent (15 minutes)

Goal: Scaffold the project structure by giving the agent clear architectural instructions.

Step 1: Create the project directory

Create an empty directory for your todo app and initialize a git repository:

mkdir cli-todo-app
cd cli-todo-app
git init

Step 2: Start a session and give the agent your architecture

Start your agent and provide a prompt that covers four things: what to build, which language and tools to use, the basic structure, and any constraints. Here is a template:

I want to build a CLI todo app. Here are my requirements:

Language: [your choice — Python, Node.js, Go, etc.]
No external dependencies for core functionality.
The app should accept commands as CLI arguments, like:
  todo add "Buy groceries"
  todo list
  todo complete 1
  todo delete 1

Scaffold the project with:
- An entry point file
- A module for todo operations (add, list, complete, delete)
- A data structure to hold todos in memory
- A README with usage instructions

Do not implement the features yet — just set up the project structure
with placeholder functions.

Note: Adapt the language choice to your preference. The rest of this project uses Python in its examples, but every phase works with any language.

Step 3: Review the scaffold critically

Before accepting the agent's output, review it against these criteria:

  • Structure: Did it create the files you asked for? Are they organized sensibly?
  • Scope: Did it only scaffold, or did it start implementing features? If it implemented features you did not ask for, note that — scope creep from agents is common.
  • Conventions: Does the code follow standard conventions for your language?
  • Placeholders: Are the placeholder functions clear enough that you know what each one will do?

Run git diff (or git status for new files) to see exactly what the agent created. Commit the scaffold:

git add -A
git commit -m "Initial scaffold"

What to watch for

This phase is about architectural communication. Did the agent follow your structural instructions, or did it impose its own ideas? If it deviated, was its choice better than yours, or did it just ignore what you said? Both outcomes are instructive. If the agent made a better structural choice, you have learned something about the tool. If it ignored your instructions, you have learned something about how to be more specific.


Phase 2: Core Features (30 minutes)

Goal: Build the four core operations using the plan-then-execute pattern.

Step 1: Ask for a plan first

Do not ask the agent to implement all four features at once. Start with a plan:

Before writing any code, outline your plan for implementing these four
operations: add, list, complete, delete.

For each operation, describe:
- What data it needs
- What it does
- What output it produces
- What error cases it should handle

Review the plan. Does it match your expectations? Are the error cases reasonable? If something is off, correct it now — it is far cheaper to fix a plan than to fix an implementation.

Step 2: Implement one feature at a time

Work through the features in this order. For each one, give the agent a focused prompt, review the output, and commit before moving on.

Add:

Implement the "add" operation. It should accept a description string,
create a todo item with a unique ID, and store it in the in-memory list.
Print a confirmation message with the todo's ID.

Review the code. Run it. Does it work? Commit.

git add -A
git commit -m "Implement add operation"

List:

Implement the "list" operation. It should display all todos with their
ID, description, and status (pending or complete). If there are no todos,
print a helpful message instead of empty output.

Review, run, commit.

Complete:

Implement the "complete" operation. It takes a todo ID, marks that todo
as complete, and prints a confirmation. If the ID doesn't exist, print
an error message.

Review, run, commit.

Delete:

Implement the "delete" operation. It takes a todo ID, removes that todo
from the list, and prints a confirmation. If the ID doesn't exist, print
an error message.

Review, run, commit.

Step 3: Integration test by hand

Run through the full workflow manually:

# Adapt these commands to your language
python todo.py add "Buy groceries"
python todo.py add "Write report"
python todo.py list
python todo.py complete 1
python todo.py list
python todo.py delete 2
python todo.py list

If anything is broken, tell the agent what went wrong and let it fix it. Do not fix the code yourself — practice giving the agent clear feedback about the failure.

What to watch for

The plan-then-execute pattern is the key skill here. Notice how reviewing a plan before implementation lets you catch architectural issues early. Notice how working feature-by-feature with commits between each one gives you clean rollback points. Notice how focused prompts produce more reviewable output than a single "implement everything" prompt would.

Also pay attention to how the agent handles error cases. Did it handle edge cases you did not think of? Did it miss obvious ones? This tells you where the agent's judgment is reliable and where it needs your guidance.


Phase 3: Project Memory (15 minutes)

Goal: Create a CLAUDE.md (or AGENTS.md) for the todo app and measure its impact.

Step 1: Write the project memory file

Create a CLAUDE.md (for Claude Code) or AGENTS.md (for Codex CLI) in the project root. Write it yourself — do not ask the agent to write it. You know the project now. Keep it under 10 lines.

# CLAUDE.md

CLI todo app built with [your language]. No external dependencies.
Stores todos in memory (no persistence yet).

## Commands
- `[your run command] add "description"` — add a todo
- `[your run command] list` — show all todos
- `[your run command] complete <id>` — mark a todo done
- `[your run command] delete <id>` — remove a todo

## Conventions
- All user-facing output goes to stdout
- Errors go to stderr
- Exit code 0 on success, 1 on error
- [Add one convention specific to your language, e.g., "Use type hints on all functions"]

Commit it:

git add CLAUDE.md
git commit -m "Add project memory file"

Step 2: Test without project memory

Start a new session and ask the agent to add a feature without the project memory file available. Temporarily rename it:

mv CLAUDE.md CLAUDE.md.bak

Now start a fresh session and give this prompt:

Add an "edit" command that lets the user change the description of an
existing todo. Usage: todo edit <id> "new description"

Note the output. Pay attention to:

  • Did the agent understand the project structure?
  • Did it follow your existing conventions (output to stdout, errors to stderr, exit codes)?
  • Did it know how to run the app?
  • How many corrections did you need to make?

Do not commit this. Undo the changes:

git checkout .

Step 3: Test with project memory

Restore the project memory file and start a new session:

mv CLAUDE.md.bak CLAUDE.md

Give the exact same prompt:

Add an "edit" command that lets the user change the description of an
existing todo. Usage: todo edit <id> "new description"

Compare the output to the previous attempt:

  • Did the agent follow your conventions this time?
  • Did it structure the code consistently with the existing features?
  • How many corrections did you need?

If the project memory made a noticeable difference, commit this version:

git add -A
git commit -m "Add edit operation"

What to watch for

This is the experiment that makes project memory concrete. The side-by-side comparison shows you exactly what context the agent was missing and how a small configuration file fills the gap. Some differences will be dramatic (the agent now follows your error conventions). Others will be subtle (slightly more consistent code style). Both matter over the life of a project.


Phase 4: Testing (20 minutes)

Goal: Use the test-first pattern — ask the agent to write tests, then implement to make them pass.

Step 1: Ask the agent to write tests first

Write unit tests for the todo operations module. Cover these cases:

add:
- Adding a todo returns a todo with a unique ID
- Adding multiple todos assigns different IDs

list:
- Listing with no todos returns an empty list
- Listing returns all added todos with correct status

complete:
- Completing a valid ID changes status to complete
- Completing an invalid ID raises an error or returns an error indicator

delete:
- Deleting a valid ID removes the todo
- Deleting an invalid ID raises an error or returns an error indicator

Use [your language's standard test framework — pytest, jest, go test, etc.].
Do not modify the implementation — only write tests.

Step 2: Run the tests

# Adapt to your test runner
pytest tests/
# or: npm test
# or: go test ./...

Some tests may fail. That is expected — the test-first pattern intentionally reveals gaps between the test's expectations and the current implementation.

Step 3: Fix failures with the agent

For each failing test, tell the agent what failed and ask it to fix the implementation (not the test):

The test `test_complete_invalid_id` is failing because the complete
function doesn't raise a ValueError when given an ID that doesn't exist.
Fix the implementation to handle this case. Do not modify the test.

Run the tests again after each fix. Repeat until all tests pass.

Step 4: Commit

git add -A
git commit -m "Add unit tests and fix edge cases"

What to watch for

The test-first pattern flips the usual agent workflow. Instead of asking the agent to implement a feature and hoping it handles edge cases, you define the edge cases upfront as tests. The agent then has a concrete target: make the tests pass. This produces more reliable code because the agent is working toward an unambiguous success criterion instead of interpreting your intent.

Notice also how the agent handles the constraint "do not modify the test." Does it respect that boundary, or does it try to change the tests to match its preferred implementation? Agents sometimes resist constraints, and recognizing this tendency helps you enforce boundaries in larger projects.


Phase 5: Reflection (10 minutes)

Goal: Review what happened and capture lessons for future agent collaboration.

Step 1: Review the full project

Look at the complete codebase:

git log --oneline
git diff HEAD~5..HEAD --stat

Scan through the files. How much of this code did you write versus the agent? How much of the agent's code did you have to correct?

Step 2: Fill out the reflection template

Copy this template into a file or a scratch document. Answer each question honestly.

## Project Reflection

### What the agent did well
- [ ] Scaffolding and project structure
- [ ] Implementing straightforward CRUD operations
- [ ] Following conventions from CLAUDE.md
- [ ] Writing tests
- [ ] Other: ___

### What needed correction
- [ ] Scope creep (implemented more than asked)
- [ ] Missed error cases
- [ ] Inconsistent code style
- [ ] Ignored constraints
- [ ] Other: ___

### Key observations
1. The most effective prompt I wrote was: ___
2. The least effective prompt I wrote was: ___
3. The agent was fastest at: ___
4. The agent struggled most with: ___
5. Project memory made the biggest difference for: ___

### What I would do differently next time
1. ___
2. ___
3. ___

### Time spent
- Phase 1 (Setup): ___ minutes
- Phase 2 (Features): ___ minutes
- Phase 3 (Memory): ___ minutes
- Phase 4 (Testing): ___ minutes
- Phase 5 (Reflection): ___ minutes
- Total: ___ minutes

Step 3: Update your CLAUDE.md

Based on what you learned, add one or two lines to your project memory file. What did the agent get wrong that a configuration line could prevent next time?


Bonus Challenges

Each of these is a self-contained agent task. Use the skills you practiced above — plan first, implement incrementally, review critically, commit often.

Challenge 1: Add Persistence

Ask the agent to save todos to a JSON file so they survive between sessions. This tests whether the agent can modify an existing architecture (in-memory to file-based) without breaking the current interface.

Add file-based persistence to the todo app. Save todos to a todos.json
file in the current directory. Load from the file on startup, save after
every change. The CLI interface should not change — all existing commands
should work exactly as before.

Challenge 2: Add Priorities

Ask the agent to add priority levels (high, medium, low) to todos. This tests whether the agent can extend a data model and update all downstream code that touches it.

Add priority support. Each todo should have a priority: high, medium,
or low. Default to medium if not specified. Update the add command to
accept an optional --priority flag. Update the list command to show
priority and sort by priority (high first). Update the data model and
all relevant tests.

Challenge 3: Add Due Dates

Ask the agent to add due dates with overdue highlighting. This tests whether the agent handles dates correctly and integrates display logic.

Add due date support. The add command should accept an optional --due flag
with a date in YYYY-MM-DD format. The list command should show due dates
and highlight overdue items. Add a "todo overdue" command that lists only
items past their due date. Update tests to cover the new functionality.

What You Practiced

Use this checklist to map what you did back to the curriculum concepts.

ExerciseCurriculum ConceptModule
Giving the agent architectural instructionsClear prompting with intent and constraints03: Prompting
Reviewing scaffold output for scope creepCritical review of agent output01: Your First Hour
Plan-then-execute for featuresTask decomposition03: Prompting
Implementing one feature at a time with commitsSession architecture and checkpointing07: Session Architecture
Creating and testing CLAUDE.mdProject memory02: Project Memory
A/B testing with and without project memoryMeasuring project memory impact02: Project Memory
Test-first agent patternConstraining agent output with tests04: Hooks and Commands
Structured reflectionCalibrating trust and building intuition01: Your First Hour
Bonus challenges as self-directed agent tasksIndependent agentic developmentAll modules

If you completed all five phases and can check off at least six items in this table, you have a solid foundation. You are ready for the intermediate modules.


title: "Case Study: Solo Developer Builds a SaaS MVP in Two Weeks" last_updated: 2026-03-21 status: experimental difficulty: intermediate tags: [solo-developer, saas, mvp]

Case Study: Solo Developer Builds a SaaS MVP in Two Weeks

Context

  • Project type: SaaS project management tool (kanban boards, task assignments, team dashboards)
  • Team size: 1 developer
  • Tools used: Claude Code, Next.js 14 (App Router), Postgres via Drizzle ORM, Tailwind CSS, Vercel for hosting
  • Duration: 2 weeks (10 working days)
  • Budget: Tight — needed to ship on a bootstrapper's budget with minimal infrastructure cost

The developer — call her Maya — had been freelancing for three years and wanted to build a SaaS product on the side. She had experience with Next.js and Postgres but had never shipped an entire product alone in two weeks. Her previous side projects had stalled at the 60% mark, dragged down by the sheer volume of code a full-stack app requires.

Maya had been using Claude Code for about a month on client work. She was comfortable with the basics: project memory, focused prompts, reviewing diffs. She decided to push the tool harder and see whether agentic workflows could compress a 4-6 week solo build into two.

The Challenge

Building a project management SaaS is not technically novel, but the surface area is large. Even an MVP needs: authentication, team management, project CRUD, kanban boards with drag-and-drop, task creation and assignment, a dashboard with basic metrics, email notifications, and a billing integration. For a solo developer, the bottleneck is not any single feature — it is the accumulated volume of code across the full stack.

Maya needed to ship a usable product in 10 working days. Not a demo. Not a proof of concept. A product that real users could sign up for, create teams, and manage projects.

The Approach

Day 1-2: Foundation and Project Memory

Maya started by spending a full morning writing her CLAUDE.md before writing any application code. This felt counterintuitive — she wanted to start building — but her experience on client projects had taught her that a good project memory file paid for itself within the first day.

Her initial CLAUDE.md covered:

  • Project overview (one paragraph)
  • Tech stack with specific versions
  • Directory structure conventions (where API routes go, where components go, how to organize database schemas)
  • Naming conventions (camelCase for functions, PascalCase for components, snake_case for database columns)
  • The build and test commands
  • A "do not" section: no class components, no raw SQL (use Drizzle), no client-side data fetching for anything that could be server-rendered

She then asked the agent to scaffold the project: Next.js app with App Router, Drizzle ORM configuration, Tailwind setup, and a basic directory structure. The agent handled this in one pass. She reviewed the scaffold, committed, and moved on.

The afternoon went to authentication. This was the one feature she implemented mostly by hand. Auth involves secrets, session management, redirect logic, and security edge cases that she did not trust the agent to get right without heavy review. She used the agent to generate the boilerplate (database schema for users, basic login and signup pages) but wrote the session logic and middleware herself. Estimated split: 40% agent, 60% manual for auth.

Day 3-5: Core Features with Daily Sessions

Maya structured each day around a single major feature area. The pattern was consistent:

  1. Morning planning session (15 min) — She used plan mode to have the agent analyze the next feature and propose an implementation plan. She reviewed the plan, corrected any issues, and approved it.

  2. Implementation session (3-4 hours) — She worked through the plan feature by feature, using focused prompts for each piece. Database schema first, then API routes, then UI components. She committed after each completed piece.

  3. End-of-day review (30 min) — She reviewed all the day's diffs, ran the app manually, and updated CLAUDE.md with anything the agent had gotten wrong that a configuration line could prevent tomorrow.

Day 3: Team management (create teams, invite members, role-based access). Day 4: Project CRUD and the kanban board layout. Day 5: Task creation, assignment, and the drag-and-drop interaction.

The drag-and-drop implementation on Day 5 was the first real struggle. The agent produced a working drag-and-drop using the HTML5 drag API, but the state management for reordering cards across columns was brittle. When a card was dragged between columns, the optimistic UI update occasionally desynced from the server state. Maya spent two hours debugging this with the agent before deciding to handle the state management logic manually while letting the agent generate the UI components and API calls around it.

Day 6-7: Sub-Agents for Parallel Frontend/Backend Work

By Day 6, Maya had enough structure in place that frontend and backend work could proceed independently. She started using sub-agents to parallelize:

  • One agent session worked on the dashboard UI components (charts, metrics cards, layout)
  • Another agent session worked on the backend aggregation queries and API endpoints that powered the dashboard

She used separate terminal windows and kept both sessions focused on their respective domains. The CLAUDE.md file served as the shared contract — both agents read the same conventions, used the same data models, and followed the same patterns. When the frontend agent needed to know the API response shape, it read the Drizzle schema and the API route files the backend agent had created.

This parallel approach cut the dashboard work from an estimated two days to one. The integration was not seamless — the frontend agent assumed a slightly different response format in one endpoint, which took 20 minutes to fix — but the time savings were substantial.

Day 8-9: Polish, Testing, and Edge Cases

Day 8 focused on testing. Maya used the test-first pattern: she asked the agent to write integration tests for the API routes, then ran them and fixed failures. The agent wrote 34 tests across the core endpoints. 28 passed on the first run. The 6 failures revealed real bugs — incorrect error status codes, a missing authorization check on one endpoint, and a race condition in the task reordering logic.

Day 9 was polish: email notifications (the agent handled the templates and sending logic; Maya configured the email provider manually), responsive design fixes, loading states, and error boundaries. This was the kind of high-volume, low-complexity work where the agent excelled. She would point to a page, describe the issue ("this page has no loading state — add a skeleton loader that matches the layout"), and the agent would handle it.

Day 10: Deployment and Launch

The final day was deployment to Vercel, environment variable configuration, a quick smoke test, and writing the landing page. The agent wrote the landing page copy and layout. Maya revised the copy (the agent's version was too generic) but kept the layout. She was live by 3 PM.

What Worked

Project memory evolution. The CLAUDE.md file went from 15 lines on Day 1 to 45 lines by Day 10. Each addition was reactive — something the agent got wrong, turned into a configuration line so it would not happen again. By the second week, the agent's first-pass accuracy was noticeably higher than the first week, not because the agent improved, but because the project memory had accumulated the project's specific patterns.

Checkpoint commits. Maya committed after every completed feature, sometimes multiple times per hour. This meant she could always roll back if the agent went off track. On three occasions, she reverted the agent's work entirely and re-approached the feature with a different prompt. Without frequent commits, those reversions would have been painful.

Plan mode for complex features. For anything involving multiple files or non-obvious architecture (auth, drag-and-drop, dashboard aggregations), Maya used plan mode to review the agent's approach before it wrote any code. This caught two significant architectural issues before they were implemented — once the agent planned to put business logic in a React component, and once it planned a database query that would not scale.

Daily CLAUDE.md updates. Treating the project memory file as a living document — updated at the end of every working day — created a compounding advantage. By Day 5, the agent was producing code that felt like it was written by someone who had been on the project for weeks.

Sub-agents for frontend/backend parallelism. Once the data models were stable, running separate sessions for frontend and backend work roughly doubled throughput for the dashboard feature.

What Didn't Work

Complex state management. The agent consistently struggled with state that involved coordination between multiple components. The kanban drag-and-drop, optimistic UI updates, and real-time dashboard refresh all required manual intervention. The agent could produce code that worked for the simple case but broke on edge cases involving timing, race conditions, or multi-step UI interactions.

Auth required heavy manual work. Despite clear instructions, the agent made security-sensitive decisions that Maya was not comfortable shipping without manual review and modification. Session token handling, CSRF protection, and the password reset flow all needed significant human intervention. For auth in production, the agent was a boilerplate generator, not an implementer.

Agent-written copy was generic. The landing page copy, error messages, and onboarding text all read like they were written by an AI — because they were. Maya rewrote most user-facing copy. The agent is good at generating placeholder text and structuring layouts, but the words themselves lacked the specificity and personality a product needs.

Inconsistent import patterns. Even with CLAUDE.md specifying import conventions, the agent occasionally used different import styles in different files (named imports in one file, default imports in the next). This was cosmetic but annoying to clean up across 40+ files. A hook that ran a linter after each edit would have caught this automatically.

Metrics

MetricValue
Total working days10
Estimated days without agent workflows25-30
Lines of application code~8,200
Percentage agent-written (estimated)~60%
Percentage agent-written then manually revised~15%
Percentage fully manual~25%
Total API cost (Claude Code)~$45
Number of commits67
Number of full reversions3
Tests written34 integration, 12 unit
CLAUDE.md final length45 lines

Key Takeaways

  1. Project memory is your highest-leverage investment as a solo developer. Every line in CLAUDE.md that prevents a common agent mistake saves you correction time across every future session. Start it on Day 1 and update it daily. The compound effect is significant by the end of the first week.

  2. Let the agent handle volume; handle complexity yourself. The agent excelled at high-volume, pattern-following work: CRUD endpoints, UI components, test boilerplate, responsive layouts. It struggled with work that required reasoning about state over time, security implications, or subtle user experience decisions. Knowing which category a task falls into is the key delegation skill.

  3. Plan mode prevents expensive mistakes on multi-file changes. A two-minute plan review is always cheaper than a thirty-minute debugging session. Use plan mode for any feature that touches more than two files or involves non-obvious architectural decisions.

  4. Commit after every completed feature, not at the end of a session. Frequent commits turn agent mistakes from setbacks into minor inconveniences. Three full reversions in 10 days sounds bad, but each one cost under two minutes because the last good commit was never far away.

  5. Sub-agents work when the contract between them is clear. Parallel frontend/backend sessions worked because the database schema and API contracts were defined before the parallel work started. Without that shared contract, the agents would have made incompatible assumptions. The CLAUDE.md file acted as the source of truth both sessions could read.

What We'd Do Differently

  • Set up a linter hook from Day 1. An automated lint check after every agent edit would have caught the inconsistent import patterns and saved a cleanup pass on Day 9.
  • Write test scaffolds before implementation, not after. Using the test-first pattern from Day 1 (instead of saving testing for Day 8) would have caught bugs earlier and given the agent clearer success criteria for each feature.
  • Use a well-tested auth library instead of building from scratch. Even with agent assistance, hand-rolling auth was the most time-consuming and highest-risk part of the project. A library like NextAuth or Lucia would have been faster and safer.
  • Add response schema examples to CLAUDE.md. The frontend/backend desync on the dashboard endpoint could have been prevented by including a sample API response in the project memory file.

Patterns Used

  • Fan-Out Fan-In — for parallel frontend/backend development sessions
  • Plan Mode — for reviewing complex feature approaches before implementation
  • Checkpoint Commits — for maintaining clean rollback points throughout the build
  • Test-First Agent Pattern — for driving implementation through pre-written tests

title: "Case Study: Team Migration from Python 2 to Python 3" last_updated: 2026-03-21 status: experimental difficulty: advanced tags: [team, migration, legacy]

Case Study: Team Migration from Python 2 to Python 3

Context

  • Project type: Internal operations platform (inventory management, order processing, reporting)
  • Team size: 4 developers
  • Tools used: Claude Code, parallel worktrees, custom hooks, shared CLAUDE.md
  • Duration: 3 weeks (projected 6 weeks without agent assistance)
  • Codebase: ~50,000 lines of Python 2.7, 12 major modules, 340 source files, ~60% test coverage

The team — two senior developers and two mid-level — maintained an internal platform that had been running on Python 2.7 since 2016. The Python 2 end-of-life had been acknowledged for years, but the migration kept getting deprioritized. Now a critical dependency was dropping Python 2 support entirely, forcing the issue. The team had a 6-week window before the dependency update was mandatory.

The codebase was typical of long-lived internal tools: mostly well-structured, but with pockets of code that relied on Python 2 idioms — print statements instead of functions, unicode and str type handling, dictionary methods that returned lists instead of views, old-style class definitions, and scattered has_key() calls. More concerning were the dynamic typing patterns: functions that accepted both strings and byte sequences, implicit integer division in financial calculations, and except Exception, e syntax throughout.

No one on the team had done a Python 2 to 3 migration at this scale before.

The Challenge

A 50,000-line migration is not conceptually difficult — most Python 2 to 3 changes are mechanical. The challenge is volume and verification. Mechanical changes across 340 files must all be correct. The 40% of the codebase without test coverage could silently break. And several modules had subtle runtime behavior differences between Python 2 and 3 that automated tools like 2to3 handle incorrectly or not at all.

The team estimated 6 weeks for a manual migration: 2 weeks to convert, 2 weeks to test and fix, 2 weeks as buffer. They had exactly 6 weeks. No margin for error.

The Approach

Week 0: Preparation (2 days, before the migration clock started)

The team lead, David, spent two days before the migration officially began setting up the agentic workflow infrastructure.

Shared CLAUDE.md with migration rules. David wrote a CLAUDE.md that served as the migration rulebook. It was not a general project memory file — it was purpose-built for the migration. It contained:

  • The migration goal (Python 2.7 to Python 3.11)
  • A table of specific transformations: print statements to functions, unicode() to str(), .has_key() to in, dict.iteritems() to dict.items(), old-style classes to new-style, except Exception, e to except Exception as e
  • Rules for the tricky cases: how to handle mixed string/bytes functions, how to convert integer division, how to handle __future__ imports
  • A "do not touch" list: three modules that had been flagged for manual-only migration due to complex runtime behavior
  • The test command and how to run tests for a specific module
  • The commit message convention: migrate(module-name): description

This file was 78 lines — longer than the typical project memory file, but justified by the specialized nature of the work. Every developer on the team and every agent session would read it.

Agent team structure. David defined three agent roles:

  1. Exploration agent — Maps module dependencies, identifies Python 2-specific patterns, and produces a migration difficulty report for each module
  2. Implementation agents — One per developer, each handling assigned modules. Performs the actual code migration.
  3. Review agent — Runs after each module is migrated. Checks for missed Python 2 patterns, runs tests, and flags issues.

Week 1: Exploration and Easy Modules

Day 1: Dependency mapping. David ran the exploration agent across the entire codebase:

Analyze this Python 2 codebase. For each module in src/, report:
1. Python 2-specific patterns found (list each type and count)
2. Dependencies on other internal modules
3. External library dependencies and their Python 3 compatibility
4. Estimated migration difficulty (low, medium, high) with reasoning

Output a structured summary I can use to plan the migration order.

The exploration agent spent about 20 minutes reading files and produced a structured report. It categorized the 12 modules:

  • Low difficulty (5 modules): Mostly print statements and old-style string formatting. Minimal inter-module dependencies. Could be migrated independently.
  • Medium difficulty (4 modules): Mixed string/bytes handling, integer division in calculations, some dynamic typing patterns. Could be migrated independently but needed careful testing.
  • High difficulty (3 modules): Complex runtime behavior, heavy use of __metaclass__, monkey-patching, and dynamic attribute access. These were the "do not touch" modules flagged for manual migration.

Days 2-5: Parallel migration of low-difficulty modules. The team split into pairs. Each pair took 2-3 low-difficulty modules and used parallel worktrees so their agents could work without file conflicts.

The worktree setup was straightforward:

# Each developer creates their own worktree
git worktree add ../migration-alice -b migrate/alice-modules
git worktree add ../migration-bob -b migrate/bob-modules
git worktree add ../migration-carol -b migrate/carol-modules
git worktree add ../migration-david -b migrate/david-modules

Each developer ran their agent in their own worktree with the shared CLAUDE.md. The prompt pattern was consistent:

Migrate src/[module_name]/ from Python 2 to Python 3. Follow the
migration rules in CLAUDE.md exactly. After migration:
1. Run the module's tests with: pytest tests/[module_name]/ -v
2. Fix any test failures caused by the migration
3. Report what you changed and any issues you encountered

The five low-difficulty modules were migrated by end of Day 4. Each migration followed the same flow: agent converts the code, agent runs tests, agent fixes failures, developer reviews the diff and commits. The agents handled the mechanical transformations flawlessly — print statements, has_key(), old-style classes, and iteritems() were all converted correctly without exception.

Day 5 was spent on cross-module integration testing. The team merged all five migration branches and ran the full test suite. Three integration tests failed due to import-order changes — the agents had reordered imports to follow PEP 8 (an improvement, but one that affected test fixtures that depended on import side effects). These were fixed manually in under an hour.

Week 2: Medium-Difficulty Modules and the Review Agent

Days 6-8: Medium-difficulty modules. The four medium-difficulty modules required more guidance. The key difference was the string/bytes handling and integer division.

For these modules, the developers added module-specific context to their prompts:

Migrate src/order_processing/ from Python 2 to Python 3. Follow the
migration rules in CLAUDE.md.

Additional context for this module:
- Functions in invoice.py accept both str and unicode in Python 2.
  In Python 3, convert these to accept only str (text strings).
  Add explicit .encode()/.decode() calls where bytes are needed.
- The tax calculation in pricing.py uses integer division. In Python 3,
  / returns float. Use // for integer division where the original
  behavior must be preserved. Add a comment "# integer division preserved
  from py2" on each line you change.
- Run the full test suite, not just this module's tests, since
  order_processing is imported by reporting and dashboard.

The agents handled most of the medium-difficulty work correctly. The integer division conversions were all correct. The string/bytes handling was mostly correct but needed manual review — in two cases, the agent converted a function to accept only str when it actually needed to handle bytes from a network socket. The developer caught these in review because the function names (read_socket_data, parse_binary_header) made the byte-handling requirement obvious to a human but not to the agent, which was following the general rule from the prompt.

Days 8-9: Review agent pass. After each module was migrated, a review agent scanned the converted code:

Review the Python 2 to 3 migration of src/[module_name]/. Check for:
1. Any remaining Python 2 syntax (print statements, except old syntax,
   has_key, iteritems, etc.)
2. Potential runtime behavior changes (integer division, string/bytes,
   dictionary ordering assumptions)
3. Missing __future__ imports where they should be present
4. Tests that pass but may not be testing the right behavior after
   migration

Report findings as a checklist. Do not make changes.

The review agent caught four issues across the four modules:

  • One file still had a print statement inside a comment that was actually dead code (the agent had converted the print but the line was unreachable)
  • Two tests were passing but testing Python 2 behavior — they asserted that dict.keys() returned a list, which is true in Python 2 but returns a view in Python 3. The tests passed only because the test data had one key, making the comparison incidentally true.
  • One module had a sys.maxint reference that should have been sys.maxsize

Without the review agent, the dict.keys() issue would have been a production bug discovered weeks later. It was the kind of subtle behavioral difference that passes tests but breaks under real data.

Week 3: High-Difficulty Modules and Final Integration

Days 10-12: Manual migration with agent assistance. The three high-difficulty modules were migrated by the senior developers with the agent in a supporting role. The agent handled the mechanical transformations (same as the easy modules), but the developers manually reviewed and often rewrote the complex sections: metaclass usage, monkey-patching patterns, and dynamic attribute construction.

The split was roughly: agent handled 40% of the changes in these modules (the straightforward syntactic conversions), developers handled 60% (the behavioral and architectural changes).

Day 13: Full integration testing. All modules merged into a single branch. Full test suite run. 347 out of 362 tests passed. The 15 failures were:

  • 8 related to string encoding in the high-difficulty modules (fixed by developers)
  • 4 related to dictionary ordering assumptions (Python 3.7+ guarantees insertion order, but the code assumed arbitrary order and used sorted() — the agents had removed the sorted() calls since "dictionaries are ordered in Python 3," which was correct but changed behavior for test assertions)
  • 3 related to changes in exception chaining behavior (Python 3 exception chains revealed by __cause__ attributes the code was not expecting)

All 15 were fixed by Day 14.

Day 15: Cleanup and documentation. The final day was spent removing __future__ imports that were no longer needed, updating the project's CI configuration to use Python 3.11, updating the README, and writing a brief migration document for the team's records.

What Worked

Parallel worktrees for independent modules. This was the single biggest time saver. Four developers migrating modules simultaneously, each in their own worktree, eliminated the serialization bottleneck that would have made this a 6-week project. The worktree approach meant no branch-switching overhead and no file conflicts during the parallel work phase.

Shared CLAUDE.md as a migration rulebook. The 78-line migration-specific CLAUDE.md ensured all four developers' agents followed the same transformation rules. Without it, each agent would have made slightly different decisions about edge cases (integer division handling, string/bytes conversion style, import ordering), creating an inconsistent codebase that would have been harder to review and debug.

The exploration agent for planning. Spending Day 1 on automated codebase analysis produced a migration plan that was more thorough than a manual assessment would have been. The agent found Python 2 patterns in files the team had forgotten about, and its difficulty ratings were accurate for 11 out of 12 modules (it underestimated one medium-difficulty module that turned out to have more dynamic typing than its static analysis revealed).

Hooks for automated py2-to-py3 lint checks. The team set up a hook that ran pyupgrade --py3-plus after every agent edit. This caught several instances where the agent's migration was correct but not idiomatic Python 3 — for example, using dict() constructor calls instead of dict literals, or keeping object as an explicit base class. The hook flagged these automatically, and the agent fixed them in the same session.

The review agent as a second pair of eyes. The dedicated review pass caught four issues that the implementation agents and human reviewers missed. The dict.keys() behavioral difference was particularly valuable — it was a correctness bug hidden by incidental test data.

What Didn't Work

Agents made incorrect assumptions about runtime behavior. The most dangerous mistakes were not syntactic — 2to3 and the agents handled syntax reliably. The dangerous mistakes were behavioral. The agent assumed dict.keys() returning a view was always a drop-in replacement (it is, except when the result is mutated during iteration). The agent assumed str and bytes could be cleanly separated (they could, except in the network modules that genuinely needed both). These behavioral assumptions required human review to catch.

Dynamic typing edge cases needed human judgment. Functions that accepted multiple types in Python 2 (a common pattern in legacy Python code) could not be mechanically converted. The agent followed the rules in CLAUDE.md, but the rules could not cover every case. In the order_processing module alone, there were seven functions where the correct Python 3 type handling depended on understanding the calling code, the data flow, and the business logic — context that the agent did not have even with the project memory file.

The agents occasionally fought the "do not touch" list. Despite the CLAUDE.md explicitly listing three modules as manual-only, two of the implementation agents tried to migrate files in those modules when they encountered import chains that crossed into the forbidden zone. The agents were following a reasonable instinct — fixing imports that would break — but they were violating an explicit constraint. This happened twice and was caught in review, but it highlighted the need for stronger guardrails on constraint enforcement, possibly through hooks that reject edits to specific directories.

Test coverage gaps amplified risk. The 40% of the codebase without tests was the scariest part of the migration. The agents could convert the syntax, but without tests, there was no automated way to verify correctness. The team ended up writing manual test scripts for the untested modules, which consumed most of Day 13. If the codebase had better coverage going in, the entire migration would have been faster and safer.

Review agent missed some cross-module issues. The review agent checked each module in isolation. It did not catch issues that only manifested when multiple migrated modules interacted — like the 4 dictionary-ordering failures that appeared in integration testing. A review prompt scoped to cross-module interactions would have caught these earlier.

Metrics

MetricValue
Total calendar time3 weeks (15 working days)
Original estimate (manual)6 weeks
Lines of code migrated~50,000
Source files modified312 out of 340
Percentage automated by agents~70%
Percentage agent-assisted (human-reviewed and revised)~15%
Percentage fully manual~15%
Total API cost (Claude Code, 4 developers)~$200
Worktrees used simultaneously (peak)4
Tests passing after migration362/362
Bugs caught by review agent4
Post-migration production issues (first 2 weeks)1 (encoding edge case in a rarely-used report)
CLAUDE.md length (migration-specific)78 lines

Key Takeaways

  1. Shared project memory is the coordination mechanism for agent teams. When four developers run independent agent sessions, the CLAUDE.md file is the only thing that keeps them aligned. Invest time in making it specific, accurate, and comprehensive for the task at hand. For a migration, this means explicit transformation rules, not general guidance.

  2. Mechanical changes are the agent's strength; behavioral changes are yours. Agents handle syntactic transformations (print statements, exception syntax, method renames) with near-perfect accuracy. They handle behavioral changes (type semantics, division behavior, iteration side effects) with dangerous confidence — they produce code that looks correct and usually is, but the exceptions are subtle and costly. Review every behavioral change manually.

  3. Worktrees turn a serial migration into a parallel one. Without worktrees, four developers cannot work on the same codebase simultaneously without constant merge conflicts. With worktrees, each developer gets an isolated workspace, the agents cannot interfere with each other, and the merge step is clean because the modules are independent. This is the single biggest force multiplier for team-scale agent work.

  4. A dedicated review agent catches what implementation agents and humans miss. The implementation agent is focused on making changes. The human reviewer is focused on correctness of those changes. Neither is systematically checking for patterns that are technically correct but behaviorally different. A review agent with a specific checklist fills that gap.

  5. Constraint enforcement needs more than instructions — it needs guardrails. Telling an agent "do not touch these modules" in CLAUDE.md is necessary but not sufficient. The agent may follow an import chain into a forbidden module and make changes there. For hard constraints, use hooks that reject edits to protected files or directories. Instructions guide behavior; hooks enforce it.

What We'd Do Differently

  • Write integration-level review prompts, not just module-level ones. The review agent checked modules in isolation. A second review pass scoped to cross-module interactions (shared types, import chains, integration test scenarios) would have caught the dictionary-ordering issues before the full integration test day.
  • Increase test coverage before starting the migration. The 40% coverage gap was the biggest risk factor. Writing tests for the uncovered modules before migrating them would have been slower upfront but would have eliminated the scariest part of the verification phase.
  • Use hooks to enforce the "do not touch" list. A pre-edit hook that rejected changes to the three manual-only modules would have prevented the two constraint violations without relying on human review to catch them.
  • Run the exploration agent per-function, not per-module, for medium and high difficulty modules. The module-level difficulty rating was useful for planning but too coarse for execution. A function-level analysis would have identified the specific functions that needed human attention, letting the agents handle the rest of the module with less supervision.

Patterns Used

  • Fan-Out Fan-In — for parallel module migration across four developers
  • Exploration Agent — for pre-migration codebase analysis and difficulty assessment
  • Review Agent — for post-migration verification of each module
  • Shared Project Memory — for consistent migration rules across all agent sessions
  • Parallel Worktrees — for isolated, conflict-free parallel editing

Glossary

Terms used throughout this curriculum, defined precisely.


Agent — An AI system that can take autonomous actions (reading files, running commands, making edits) to accomplish a task, as opposed to only generating text responses. In this curriculum, agents are AI coding assistants like Claude Code and Codex CLI that operate within your development environment.

Agentic Development — The practice of building software with AI agents as active collaborators that read, write, and execute code autonomously, rather than using AI only for text completion or suggestions.

AGENTS.md — A configuration file used by Codex CLI (and other tools adopting the open standard) to provide project-specific instructions, conventions, and context to AI agents. Analogous to CLAUDE.md for Claude Code.

Anti-Pattern — A commonly used approach that appears effective but actually leads to poor outcomes — wasted tokens, broken workflows, or unreliable results.

Battle-Tested — The highest maturity rating for patterns in this repository. Indicates the pattern has been validated in production by multiple teams with edge cases documented.

CLAUDE.md — A configuration file placed in a project's root (or subdirectories) that provides Claude Code with project-specific instructions, architecture context, coding conventions, and workflow guidance. The single highest-leverage tool for improving agent performance.

Context Window — The maximum amount of text (measured in tokens) that an AI agent can process in a single session. Managing what goes into this window is a core agentic development skill.

Experimental — The lowest maturity rating for patterns. Indicates the pattern has been tried by 1-2 people and shows promise but lacks broad validation.

Fan-Out/Fan-In — An orchestration pattern where a parent agent delegates multiple independent sub-tasks to child agents (fan-out), then collects and synthesizes their results (fan-in).

Headless Mode — Running an AI coding agent without an interactive terminal session, typically in CI/CD pipelines or automation scripts. The agent receives instructions programmatically and returns results without human interaction.

Hook — A script or command that runs automatically in response to agent events (before/after tool calls, on session start/end, etc.). Used to enforce standards, add logging, or customize agent behavior.

MCP (Model Context Protocol) — An open protocol that allows AI agents to connect to external tools, data sources, and services. MCP servers expose capabilities that agents can discover and use.

Orchestration — The coordination of multiple agents or agent sessions to accomplish a complex task. Includes patterns like pipelines, hierarchies, swarms, and fan-out/fan-in.

Pattern — A named, reusable workflow for accomplishing a specific type of task with AI agents. Each pattern in this repository includes when to use it, when not to use it, and a runnable example.

Plan Mode — A feature in Claude Code where the agent analyzes a task and proposes an approach before making changes. Useful for complex tasks where you want to review the strategy before execution.

Project Memory — The collective set of configuration files (CLAUDE.md, AGENTS.md, etc.) that give an AI agent persistent context about your project across sessions.

Proven — The middle maturity rating for patterns. Indicates the pattern has been used successfully by multiple practitioners.

Sandbox — In this curriculum, a self-contained directory within a module that contains a runnable example demonstrating the module's concepts. Also refers to Codex CLI's sandboxed execution environment.

Session — A single continuous interaction between a developer and an AI agent. Session architecture (how you structure, scope, and connect sessions) is a key skill taught in Module 07.

Sub-Agent — A child agent spawned by a parent agent to handle a delegated sub-task. The parent agent coordinates the sub-agent's work and integrates its results.

Token — The basic unit of text processed by AI models. Roughly 4 characters or 0.75 words in English. Token usage directly affects cost and context window consumption.

Worktree — A Git feature that creates an additional working directory linked to the same repository. Used in agentic development to give agents isolated workspaces for parallel tasks without branch-switching conflicts.

Contributing to The Agentic Developer's Playbook

Thank you for helping build the canonical resource for agentic software engineering. This guide explains how to contribute effectively.

Contribution Tiers

Tier 1: Low Barrier (no prior approval needed)

Just open a PR:

  • Typo fixes and grammar corrections
  • Broken link repairs
  • New config examples in reference/configs/
  • New prompt templates in reference/prompts/
  • Updates to tested_with version pins in tool-specific files
  • Minor clarifications that don't change meaning

SLA: Merged within 48 hours.

Tier 2: Medium Barrier (review required)

Open a PR using the appropriate template:

  • New patterns in reference/patterns/ (must use _TEMPLATE.md)
  • New anti-patterns in reference/anti-patterns/ (must use _TEMPLATE.md)
  • New case studies (must use _TEMPLATE.md)
  • Version migration updates to curriculum modules
  • New troubleshooting entries
  • New entries in GLOSSARY.md

SLA: First review within 72 hours.

Tier 3: High Barrier (RFC process)

Open a GitHub Discussion (RFC category) before writing code:

  • New curriculum modules
  • Structural changes to the repository
  • Changes to the contribution model
  • New practice projects

SLA: Discussion opens within 1 week.

Content Standards

YAML Frontmatter (required on all content files)

---
title: Your Content Title
tested_with:
  claude-code: "1.0.x"    # omit if not tool-specific
  codex-cli: "0.2.x"      # omit if not tool-specific
last_updated: 2026-03-21
status: experimental       # experimental | proven | battle-tested
difficulty: intermediate   # beginner | intermediate | advanced
prerequisites: []          # list of module IDs, e.g., [02-project-memory]
---

Writing Style

  • Second person — address the reader as "you"
  • Lead with why — explain the reason before the steps
  • Concrete over abstract — show real examples, not theoretical descriptions
  • Define terms on first use — link to GLOSSARY.md for formal definitions
  • Copy-paste ready — every code block should work when pasted, with "Adapt this by..." notes
  • Include cost estimates — note rough token costs for expensive workflows

The Dual-Layer Rule

  • concepts.md = tool-agnostic principles (should rarely change)
  • claude-code.md / codex-cli.md = tool-specific implementations (updated with tool releases)
  • Never put tool-specific instructions in concepts.md

Pattern Maturity

Don't self-rate as "battle-tested." New submissions start as experimental. Maturity is earned through community validation:

  • Experimental — you've used it, it works for you
  • Proven — multiple contributors have validated it
  • Battle-Tested — used in production by teams, edge cases documented

File Naming

  • All lowercase
  • Hyphens, not underscores: fan-out-fan-in.md not fan_out_fan_in.md
  • Descriptive names: god-session.md not ap-001.md
  • Templates always named _TEMPLATE.md

Before Submitting

  • YAML frontmatter is present and complete
  • last_updated reflects today's date
  • tested_with versions are current (for tool-specific content)
  • All code examples are tested and runnable
  • Internal links use relative paths
  • No tool-specific content in concepts.md files
  • New terms added to GLOSSARY.md

Reporting Issues

Code of Conduct

Be respectful, be constructive, be honest. This project values accuracy over cheerleading — if something doesn't work, say so. If a pattern has limitations, document them.

Roadmap

V1 — Credibility MVP (Current)

Goal: A developer spends 2 hours with this repo and immediately changes how they work.

  • Repository structure and infrastructure
  • CLAUDE.md (self-demonstrating)
  • Contribution model and templates
  • Modules 00-05 (Foundation + early Intermediate) — full content
  • Modules 06-11 — scaffolded with outlines
  • 10+ workflow patterns in reference/patterns/
  • 10+ anti-patterns in reference/anti-patterns/
  • Cheatsheets for Claude Code and Codex CLI
  • Practice Project 01: CLI Todo App
  • 2 case studies (hypothetical)
  • CI: markdown lint, link checking, frontmatter validation

V2 — Community Growth (Months 2-5)

Goal: Transition from "one person's resource" to "the community's resource."

  • Modules 06-09 — full content
  • 3-5 real-world case studies
  • 2 more practice projects
  • Launch GitHub Discussions
  • Pattern maturity rating system live
  • First external contributors with section ownership
  • Decision tree navigator ("I want to [X]" routing)
  • Automated staleness detection (90-day flag)

V3 — Category Definition (Months 6-12)

Goal: Become the center of gravity for agentic development practice.

  • Modules 10-11 — full content
  • Advanced orchestration architectures (swarms, pipelines, hierarchies)
  • Enterprise/team patterns
  • Reusable template repositories ("start a project with best-practice setup")
  • Community hook and MCP server registry
  • Conference talk frameworks
  • "State of Agentic Development" annual survey
  • Partnerships: tool vendor docs link here

title: Content Style Guide last_updated: 2026-03-21

Content Style Guide

Standards for writing content in the Agentic Developer's Playbook.

Voice and Tone

  • Second person — always address the reader as "you"
  • Direct and practical — lead with what to do, then explain why
  • Honest — document limitations, costs, and failure modes alongside successes
  • Opinionated — recommend specific approaches rather than listing all options equally

Structure

Curriculum Modules

Each module follows the dual-layer structure:

  • concepts.md — tool-agnostic principles (~1500 words max)

    • Opens with the core question the module answers
    • Explains the mental model or principle
    • Includes diagrams for non-trivial concepts
    • Ends with key takeaways (3-5 bullet points)
  • {tool}.md — tool-specific implementation

    • Opens with prerequisites (tool version, setup needed)
    • Step-by-step instructions with copy-paste-ready commands
    • Screenshots or output examples where helpful
    • "Adapt this by..." notes on every code block
  • exercises.md — hands-on practice

    • 3-5 exercises per module, increasing in difficulty
    • Each exercise has: objective, steps, expected outcome, hints
    • At least one exercise uses the reader's own project, not a toy example

Reference Entries

  • Use the appropriate template from the same directory
  • Every pattern has "When to Use" and "When NOT to Use" sections
  • Every anti-pattern explains "Why Developers Do This" (empathy before correction)
  • Include token cost estimates for expensive workflows

Formatting

Headings

  • H1 (#) — document title only (one per file)
  • H2 (##) — major sections
  • H3 (###) — subsections
  • Don't skip heading levels

Code Blocks

  • Always specify the language for syntax highlighting
  • Every code block must be copy-paste ready
  • Add # Adapt this by: ... comments for parts the reader should customize
  • For multi-step procedures, use separate code blocks (not one giant block)
  • Internal links: always use relative paths (../reference/patterns/fan-out-fan-in.md)
  • External links: use descriptive text, not raw URLs
  • Link to GLOSSARY.md on first use of a term within a document

Tables

  • Use for comparisons, feature matrices, and structured data
  • Keep tables under 6 columns — wider tables are hard to read
  • Left-align text columns, center-align status/boolean columns

Admonitions

Use blockquotes with bold prefixes for callouts:

Note: Additional context that's helpful but not critical.

Warning: Something that could cause problems if ignored.

Cost: Token cost information for the described workflow.

YAML Frontmatter

Required on every content file:

---
title: Document Title
tested_with:           # omit for tool-agnostic content
  claude-code: "1.0.x"
  codex-cli: "0.2.x"
last_updated: 2026-03-21
status: experimental   # experimental | proven | battle-tested
difficulty: beginner   # beginner | intermediate | advanced
prerequisites: []      # module IDs, e.g., [02-project-memory]
---

Terminology

  • Use terms as defined in GLOSSARY.md
  • Define new terms on first use with a brief inline definition
  • Add significant new terms to the glossary via PR
  • Avoid jargon that isn't in the glossary — if you need it, add it

title: Content Lifecycle last_updated: 2026-03-21

Content Lifecycle

How content moves from idea to published in this repository.

Stages

  Intake          Draft           Review          Published       Maintenance
  ------          -----           ------          ---------       -----------
  New info        Author writes   Peer review     Merged to       Staleness
  arrives in      content using   + CI checks     main branch     monitoring
  meta/intake/    template                                        + updates

1. Intake

New information enters through meta/intake/ using the intake template. Sources include:

  • Tool release notes and changelogs
  • Community pattern discoveries
  • Conference talks and blog posts
  • Reader feedback and issue reports

Weekly triage: Intake items are reviewed weekly and either:

  • Assigned to update an existing module/entry
  • Queued for a new reference entry or case study
  • Deferred with a documented reason

2. Draft

The author (contributor or maintainer) writes content using the appropriate template:

  • Curriculum modules: create all files (concepts.md, tool-specific, exercises.md)
  • Reference entries: use the _TEMPLATE.md in the target directory
  • Case studies: use case-studies/_TEMPLATE.md

Checklist before submitting for review:

  • YAML frontmatter complete
  • Follows style guide
  • All code examples tested
  • "When NOT to use" section included (for patterns)
  • Internal links use relative paths
  • Terms defined on first use

3. Review

Pull request review checks:

  • Automated (CI): Markdown lint, link validation, frontmatter schema, prose lint
  • Human: Accuracy, clarity, completeness, adherence to dual-layer rule

Review criteria:

  • Is the content accurate and tested?
  • Does it follow the dual-layer split (concepts vs tool-specific)?
  • Are the code examples copy-paste ready?
  • Would a developer at the stated difficulty level understand this?

4. Published

Once merged to main, content is live. The last_updated field in frontmatter records when it was last modified.

5. Maintenance

Staleness Detection

Content is flagged for review when:

  • last_updated is more than 90 days old
  • tested_with version is more than 2 minor versions behind the current tool release
  • A tool-update issue is filed that affects the content

Update Process

  1. Tool update issue is filed (manually or via automated monitoring)
  2. Affected files are identified from the issue
  3. Tool-specific files (claude-code.md, codex-cli.md) are updated
  4. concepts.md is updated only if the mental model changes
  5. tested_with and last_updated are bumped
  6. Normal review process applies

Deprecation

Content that is no longer relevant (e.g., a pattern superseded by a better approach):

  1. Add a deprecation notice at the top of the file
  2. Link to the replacement content
  3. Document the change in CHANGELOG.md under "Pattern Evolution"
  4. Keep the file for 90 days, then remove

title: Version Policy last_updated: 2026-03-21

Version Policy

How this repository handles tool version changes for Claude Code, Codex CLI, and other covered tools.

Principles

  1. Concepts are stable; implementations change. Tool-agnostic concepts in concepts.md should rarely need updating. Tool-specific files absorb the churn.
  2. Version-pin everything. Every tool-specific example includes the version it was tested with.
  3. Stale is worse than missing. Outdated examples that silently fail erode trust faster than gaps in coverage.
  4. Update the layer, not the whole module. When Claude Code ships a new feature, update claude-code.md — don't rewrite concepts.md.

Version Tracking

Frontmatter

Every tool-specific file includes:

tested_with:
  claude-code: "1.0.x"
  codex-cli: "0.2.x"

Use semver ranges:

  • "1.0.x" — tested with any 1.0.x patch release
  • "1.0.3" — tested with exactly this version (use when a specific version matters)

Staleness Threshold

Content is flagged when tested_with is 2+ minor versions behind the current release.

Example: If Claude Code is at 1.3.0 and a file says tested_with: "1.0.x", it gets flagged.

When a New Version Ships

Minor Releases (e.g., 1.1.0 to 1.2.0)

  1. File a tool-update issue listing affected modules
  2. Triage: which files need changes vs. which are unaffected?
  3. Update affected tool-specific files
  4. Bump tested_with and last_updated
  5. If a pattern changes: document in CHANGELOG.md under "Pattern Evolution"

Major Releases (e.g., 1.x to 2.x)

  1. Create a tracking issue for the full migration
  2. Audit all tool-specific files for breaking changes
  3. Update files in priority order: Foundation modules first, then Intermediate, then Advanced
  4. If concepts change (rare): update concepts.md with clear before/after explanation
  5. Add migration notes to the relevant module

New Tool Added

  1. RFC discussion in GitHub Discussions
  2. Create {tool}.md files in each relevant module
  3. Add to cheatsheets
  4. Add to GLOSSARY.md
  5. Update README.md curriculum tables

Supported Tools

ToolCurrent Tested VersionCoverage
Claude Code1.0.xFull (Modules 00-11)
Codex CLI0.2.xFull (Modules 00-11)

End-of-Support

If a covered tool is deprecated or discontinued:

  1. Add a notice to the top of each affected file
  2. Keep content for 6 months for historical reference
  3. Archive (move to an archive/ directory) after 6 months