Back to Research

Codex CLI review receipts

A Codex code review workflow for AGENTS.md, MCP, and verification loops that turns CLI diffs into reviewable receipts.

A Home in the Wilderness, landscape painting by Sanford Robinson Gifford (1866).
Rogier MullerMay 27, 20266 min read

A review receipt is a short, repeatable record that travels with a diff: what changed, which repo rules applied, what command verified it, and what a connector touched. If your Codex changes keep stalling in review, a receipt is usually the missing piece. Codex CLI, OpenAI's coding agent, generates code fast, but speed is rarely the problem. The problem is trust: a diff a reviewer can read, checks they can rerun, and a handoff that does not require replaying the chat.

Most teams reach for longer prompts and bigger context first. That tends to make diffs larger and local conventions drift, so reviewers spend their time guessing intent instead of checking risk. The fix is the opposite of more words. Put the rules in the repo, prove the change at the command line, and keep a receipt.

Put your rules in AGENTS.md, not in the prompt

When you hit the "why did it do that?" moment, the answer should live in a file, not in your memory of a chat. In Codex, that file is AGENTS.md, plus any nested override files closer to the code. Local repo rules beat one flat note at the root.

Keep the top-level AGENTS.md short. Add scoped instructions in the directories where the relevant code lives. Use override files only for temporary exceptions, and say so out loud. That way a reviewer can trace any behavior back to a file, which kills the "works on my prompt" problem.

The rule of thumb: if a rule matters for review, it belongs in the repo.

Make verification part of the task

Codex CLI is built for the command line, so treat verification as a loop, not a closing ceremony: change, run checks, inspect, repeat. The point is not to run more checks. The point is to record the ones you ran so a reviewer does not have to take your word for it.

Every Codex-authored change should carry the command you ran, the result, and the next check if the first one failed. Reviewers do not want a novel. They want proof the diff was exercised in the environment the repo expects. No receipt, no trust.

Keep MCP on a short leash

MCP is the protocol that connects Codex to outside systems like databases, issue trackers, or internal APIs. It is genuinely useful, and every connector you add widens what a reviewer has to think about. So name it before you use it.

For each task, write a one-line boundary note: the connector, the data it can reach, and why this task needs it. That shifts the question from "can the agent use tools?" to "did we approve this tool for this job?" It also keeps code review separate from integration risk, which are different conversations that deserve different attention.

Review in slices, not one big branch

A broad opinion on a broad diff is how real bugs slip through. Split the change into three slices and review each against its own expectation: the behavior change, the test change, and the instruction change.

This is plain, and it works. A reviewer who is checking one thing at a time catches more than one juggling three questions at once. Smaller slices make sharper reviews. If a step keeps repeating across tasks, like test setup or release checks, move it into a shared skill or command so Codex invokes it the same way every time instead of leaving prompt fragments nobody owns.

Here is a starter receipt you can paste into a pull request template for Codex-authored diffs:

# Codex Review Receipt

Change:
- What changed:
- Why it changed:
- Files touched:

Instructions:
- AGENTS.md path(s):
- Temporary override used? yes/no
- Skill or command used:

Verification:
- Command run:
- Result:
- Follow-up check:

MCP:
- Connector used:
- Scope approved for this task:
- Data touched:

Review:
- Risk to inspect:
- Reviewer note:
- Merge condition:

That is enough to start. If you make it a habit on the team, the receipt becomes the merge gate without turning review into theater.

Common questions

Where does Codex CLI look for instructions? Codex reads AGENTS.md at the repo root and nested AGENTS.md files in subdirectories, with the closest file winning for code in that path. Keep the root file short and put scoped rules where the code lives. Override files are for temporary exceptions, and you should flag them in the receipt so reviewers know the deviation was intentional.

Do I still need a human reviewer if I use a receipt? Yes. The receipt does not replace judgment, it focuses it. By recording what changed, what was verified, and which connectors were touched, the receipt lets a human spend their time on risk and merge conditions instead of reconstructing intent from a diff. The agent moves fast only because scope, checks, and ownership stay visible.

What is MCP and why does it matter for review? MCP is the protocol that lets Codex reach outside systems such as databases or internal APIs. It matters for review because every connector expands what a reviewer has to vet. A one-line boundary note naming the connector, its data access, and the reason keeps integration risk legible and separate from the code change itself.

Does a tighter loop slow my team down? A little, up front. Writing rules into the repo and recording one verification command adds minutes per change. It pays back fast: reviews get shorter, fewer diffs bounce, and nobody replays a chat to understand a merge. The setup cost is real but small, and it is paid once per repo, not once per task.

What version of Codex CLI does this apply to? This workflow holds across recent Codex CLI releases, including 0.134.0. Treat each changelog as a prompt to ask one question: what should we change in the loop this week? The useful read is never just "what is new," it is which rule, check, or boundary note you tighten next.

Start with one receipt

Add the receipt block to your pull request template and require it on the next Codex-authored diff. For a deeper walkthrough, take it into Codex CLI workflows and check whether a fresh reviewer can defend the merge without opening the chat.

Further reading

Related training topics

Related research

Ready to start?

Transform how your team builds software.

Get in touch