Back to Research

Codex CLI workflows for reviewable diffs

A Codex CLI workflow guide for openai codex cli github, AGENTS.md, MCP, and reviewable verification loops.

Vale of Kashmir, landscape painting by Robert S. Duncanson (1867).
Rogier MullerMay 25, 20266 min read

The fastest way to get a Codex diff merged is to make it easy to trust before review. Codex CLI, OpenAI's coding agent, is good at writing code, but a reviewable diff comes from the workflow around it: a written rule file, a check that runs after each change, and a short handoff. A reviewable diff is one a teammate can approve by reading the change and its proof, without replaying your whole chat.

Most teams reach for more autonomy first. They let the agent roam and ask for a summary at the end. That tends to produce long patches, fuzzy ownership, and review comments that keep circling the same gap. The fix is not less help from the tool. It is clearer boundaries around it.

Write an AGENTS.md your team will actually follow

The most common failure is instruction drift. The agent edits the right files but misses your house style, your test command, or a local architecture rule. That happens because the rule lived in someone's head, not in the repo.

Put durable rules in AGENTS.md at the repo root. Add a nested AGENTS.md where the scope changes, like a package with its own conventions. The test is simple: if a reviewer cannot point to the rule file, the agent probably could not find it either.

Keep it short and concrete. A rule the agent can act on beats a paragraph of intent.

Bound the task so the diff stays small

A vague prompt invites a sprawling patch. Ask Codex to "improve auth" and you get edits across ten files. Ask it to "update this handler, then run the test file that covers it" and you get a diff you can read in one sitting.

So bound the work, then bound the proof. Request the change, then the check, then the result, in that order. The CLI features and slash-command docs make this loop explicit. A useful rule of thumb: if the change cannot be verified in one or two commands, it is too large for one pass. Split it.

Make MCP connectors explicit boundaries

MCP is the protocol Codex uses to reach outside systems like a database, an issue tracker, or your CI. Its value is that it turns those systems into named boundaries instead of hidden behavior. That only helps if someone reviews the connector's scope before it becomes the default.

Write a short boundary note: which systems the agent may touch, what it may read, and what needs a human to approve. Keep it short enough that people will actually read it. An open connector you never reviewed is a risk you cannot see.

Leave a handoff the reviewer can audit

When work leaves the chat, the evidence should come with it. A good Codex run ends with three things: the changed files, the commands it ran, and the one place the reviewer should look first.

A summary alone is not enough, because a summary is not auditable. The changed files plus the command output are. This is the part you can standardize across every Codex task, whether the change came from the CLI, a skill, or a connector-backed run.

A starter AGENTS.md you can paste

Drop this into a repo that wants Codex work to stay reviewable. It gives reviewers the three things they need: scope, proof, and a place to start.

# AGENTS.md

## Repo rule
- Make the smallest change that solves the task.
- Prefer existing patterns over new abstractions.
- Do not edit unrelated files.

## Verification loop
- After each change, run the narrowest test or lint command that proves the edit.
- If the command fails, fix the cause before expanding scope.
- Include the command and result in the final handoff.

## Review handoff
- List changed files.
- State what was verified.
- Call out any follow-up risk or skipped check.

Treat it as a habit, not a ceremony. The point is that every run produces the same shape of evidence, so reviewers stop guessing.

Common questions

  • What is AGENTS.md and where does it go? AGENTS.md is a plain markdown file of durable rules Codex reads before it works in your repo. Put one at the root for repo-wide rules, and add nested files in subfolders where conventions differ. Keep each rule concrete and actionable, like the exact test command, so the agent and your reviewers point to the same source.

  • How do I keep a Codex diff small enough to review? Bound the task in the prompt instead of asking for broad improvements. Name the file or function to change, then name the test or lint command that proves it. If verifying the change needs more than one or two commands, it is too big for one pass, so split it into separate, checkable steps.

  • Does MCP make my setup less safe? Not by itself, but an unreviewed connector hides risk. MCP turns outside systems into named boundaries, which is the opposite of hidden access, as long as you review scope first. Write a short note listing what the agent may read, what it may change, and what needs human approval before the connector becomes default behavior.

  • What belongs in a review handoff? The changed files, the exact commands run, their output, and the one thing the reviewer should inspect first. A prose summary is not auditable on its own, so always pair it with command output a teammate can rerun. Standardizing this shape across every task is what lets reviewers approve quickly and consistently.

  • Where do I find the official Codex CLI docs? Start with the Codex quickstart and the Codex CLI docs, then read CLI features and slash commands. For the protocol behind connectors, see the MCP specification. The source lives at openai/codex.

Start here

Pick one repo, add the starter AGENTS.md above, and require a verification command on the next Codex change. Then take it into the related training topic and check whether a new reviewer can defend the merge without replaying the chat.

Related training topics

Related research

Ready to start?

Transform how your team builds software.

Get in touch