Back to Research

Codex CLI review loops

A Codex CLI workflow guide for codex cli mcp, AGENTS.md, MCP boundaries, and verification loops teams can use to ship reviewable production work.

Trees on a Rocky Hillside, landscape painting by Asher B. Durand (1844).
Rogier MullerMay 22, 20266 min read

A Codex CLI review loop is the small, repeatable cycle that turns an agent run into a diff your team can trust: clear repo rules, scoped connectors, and a test that proves the change before anyone merges. Codex CLI, OpenAI's coding agent, already gives you the surfaces for this. The work is binding them into one habit instead of three.

Most teams do not stall because Codex lacks a feature. They stall because the output is hard to review, hard to verify, and easy to drift from repo conventions once the first prompt lands. A stronger prompt does not fix that. A tighter loop does.

Write repo rules where Codex will find them

Codex reads instruction files in the repo, so that is where durable rules belong. Put your team conventions in AGENTS.md at the root, use nested AGENTS.md files for folders with their own rules, and keep short-lived exceptions in AGENTS.override.md when you genuinely need them.

The reason is simple. Rules buried in a chat thread vanish on the next handoff. Rules in a file stay with the code. If a convention has come up twice in review, it has earned a line in AGENTS.md.

This also shrinks review noise. You stop leaving comments about style or structure that a written rule would have caught on the first run. The Codex quickstart covers where these files live.

Scope every MCP connector before the first run

This is the codex cli mcp trap. A team wires in a codex mcp server or another connector, then lets it reach systems the task never needed. Before the first run, write down what the connector may read, what it may write, and what it must never touch.

MCP, the Model Context Protocol, is the open standard Codex uses to talk to outside systems like databases, issue trackers, and internal tools. Scope is part of that contract, not a thing you bolt on later. The protocol specification spells out the permission model.

When the boundary is written down, the diff explains itself in review. Every connector gets a permission story, not just a URL.

End each task with a check that proves it

Codex CLI can run work headlessly, but a headless run is not a verified change. Close every task with a command or test that proves the change in the repo, plus one line on what passed and what still needs a human eye.

Here is the shape of a task that verifies itself:

# Run the change, then prove it landed
codex exec "fix the date parser in src/dates.ts so it handles DST"

# The task should finish by running the repo's own checks
pnpm test src/dates.test.ts
pnpm type-check

The Codex CLI docs and the features page describe this kind of run for a reason: automation is only useful when it lands in a state someone can review. No proof, no merge.

Make the diff explain itself

Teams lose trust in agent work fastest when a diff arrives with no context. Require a short note next to the change: what it intended, which files it touched, which verification ran, and any MCP access it used. Keep that note in the pull request, not in a side conversation.

Use this as a starter review gate for codex cli mcp work. Paste it into your PR template:

# Codex integration checklist

- [ ] `AGENTS.md` exists at the repo root.
- [ ] Any nested `AGENTS.md` or `AGENTS.override.md` files are intentional.
- [ ] The task states the expected files, tests, and exit condition.
- [ ] Any MCP connector is named and scoped to the minimum needed access.
- [ ] The run includes a verification command or test result.
- [ ] The final diff is small enough to review in one sitting.
- [ ] The summary says what changed, what was verified, and what still needs human judgment.
- [ ] If a skill was used, its description makes the task fit obvious.

The good question in review is not "did the model finish?" It is "can another engineer trust this diff without a second conversation?" That is the bar to hold, and it is the same bar you would set for a human contributor.

Common questions

  • What is AGENTS.md?

    AGENTS.md is a plain markdown file at the repo root where you write the rules Codex should follow: build commands, test commands, code conventions, and anything an agent would otherwise guess. Codex reads it automatically on each run. Nested copies in subfolders apply local rules, and AGENTS.override.md holds temporary exceptions.

  • Do I need MCP to use Codex CLI?

    No. Codex CLI runs fine against just your repo and shell. MCP is for when a task needs to reach an outside system, like a database or issue tracker, through a connector. Add it only when the task requires it, and scope each connector to the minimum access before the first run.

  • How do I keep Codex diffs small enough to review?

    Give one task one job. Asking for code, docs, tests, and connector work in a single prompt produces a mushy diff. Split repeated work into a skill or short task file using the SKILL.md pattern, then let the CLI handle the repo-specific part. Smaller scope in means a clearer diff out.

  • Can Codex CLI run without a human watching?

    Yes, Codex CLI can execute headlessly, which is useful in CI or scripts. But a headless run still needs a verification step that proves the change, plus a summary for the human who reviews the result. Headless means unattended execution, not unattended merging.

  • Where do slash commands fit in?

    Slash commands are shortcuts for repeated Codex CLI actions inside a session, so you do not retype the same instruction. They pair well with AGENTS.md: the file holds the durable rules, the command triggers a common task. The slash commands reference lists what ships by default.

Start with one repo

Pick a single repo, add the integration checklist to its PR template, and run your next Codex task through it. Then move what works into your shared Codex CLI workflows page so the next review is faster than the last.

Related training topics

Related research

Ready to start?

Transform how your team builds software.

Get in touch