Back to Research

A Safer Codex Review Loop

Use Codex CLI, AGENTS.md, verification, and review receipts to make AI-assisted code review safer for teams.

Frederiksborg Palace seen from Jægerbakken. Evening., landscape painting by Christen Købke (1835).
Rogier MullerJune 20, 20267 min read

Use OpenAI Codex, OpenAI's coding agent, to review code changes by keeping the diff small, loading repo rules from AGENTS.md, asking for a findings-first review, and requiring a verification receipt before merge. Codex should not be the final approver; it should be the fast reviewer that catches omissions and makes the human review easier.

A Codex review receipt is a short, pasteable record of what changed, what Codex checked, what commands ran, and what still needs human judgment. For Codex CLI training, this is one of the most useful Codex CLI workflows to standardize early.

Start with a narrow review task

Ask Codex CLI to review one diff, one pull request, or one risky area at a time. A good prompt is concrete: review this branch against main, focus on auth and migrations, return findings first, then list tests you would run.

This matters because code review is not the same as code generation. In a codex code review, you want skepticism, traceability, and boring output that a maintainer can act on.

The trap is asking for a broad quality pass over a whole repo. That usually produces confident summaries and weak findings. If your team uses the phrase openai codex cli review code changes in an internal runbook, make it point to a narrow review recipe, not a vague command to inspect everything.

Put review rules in AGENTS.md

Put durable review rules in AGENTS.md so the Codex agent sees the same constraints your team expects humans to follow. Keep the root file short, then add nested AGENTS.md files for local rules in services, packages, or apps.

For example, a payments service might tell Codex to flag changes that alter idempotency keys, retry behavior, ledger writes, or schema migrations. A frontend package might focus on accessibility, state ownership, and snapshot churn.

This matters because good review depends on local context. The trap is turning AGENTS.md into a giant handbook. Codex needs the rules that change its behavior during the task, not every fact your team knows.

Run Codex from diff to verification

A useful Codex CLI review loop has three passes. First, ask for findings against the diff. Second, ask Codex to propose verification commands. Third, run the commands and paste a receipt into the pull request.

For a real backend PR, that might mean checking the diff, running unit tests for the touched package, running a migration dry run, and asking Codex to explain any remaining risk. For a UI PR, it might mean lint, typecheck, focused tests, and a manual browser check owned by a human.

This matters because review without verification is just commentary. The trap is letting Codex say looks good after reading code but before anything executable has happened. A clean codex review should separate findings, commands, results, and open questions.

Keep MCP helpful and bounded

Use Model Context Protocol, or MCP, when Codex needs approved access to external systems such as GitHub issues, design docs, incident notes, or internal API references. Keep those connections scoped to the review job.

For example, Codex may need a GitHub issue to confirm intended behavior, or a schema registry entry to check compatibility. It probably does not need production secrets, broad database access, or every document your company owns.

This matters because more context is not automatically safer. The trap is treating MCP as a permission shortcut. Write a small boundary note in AGENTS.md that says which MCP sources are allowed for review, which are read-only, and what Codex must summarize when it uses them. If your team is still designing that boundary, start with Add MCP to Codex Safely.

Tie review receipts to Codex versions

As of June 18, 2026, the official Codex changelog lists Codex app 26.616. Even when an app update is small, the version label is useful review metadata for teams running production codebase loops.

Record the Codex app or CLI version when the review matters, especially for regulated code, incident follow-ups, or repeated review tasks. This gives you a way to compare behavior if a future Codex review flags different issues on the same kind of change.

The trap is pretending AI-assisted review is timeless. It is software. Versions, prompts, repo rules, MCP access, and test commands all shape the result, so the receipt should capture enough context for a teammate to reproduce the loop.

Copy this review receipt

Paste this into your pull request template, or keep it as a slash-command output target for an openai codex cli review. The goal is not paperwork. The goal is to make the final human review faster and less guessy.

## Codex review receipt

PR:
Branch:
Reviewer:
Codex surface: Codex CLI
Codex app or CLI version:
Date:

## Scope

- Diff reviewed:
- Files or packages intentionally excluded:
- Review focus:
  - correctness
  - security
  - migrations
  - tests
  - performance
  - product behavior

## Repo instructions used

- Root AGENTS.md read: yes/no
- Nested AGENTS.md files read:
  - path/to/AGENTS.md
- Local rules that mattered:
  - 

## MCP access used

- GitHub issues: none / read-only / specific links
- Docs or design sources: none / read-only / specific links
- Databases or production systems: none
- Notes:

## Codex findings

- Blocking findings:
  - 
- Non-blocking findings:
  - 
- Questions for a human maintainer:
  - 

## Verification run

- Command:
  Result:
- Command:
  Result:
- Manual check:
  Result:

## Final status

- Ready for human review: yes/no
- Known risk:
- Follow-up issue needed: yes/no

Common questions

  • How do I use OpenAI Codex CLI to review code changes?

    Use Codex CLI on a small diff, give it AGENTS.md repo rules, ask for findings first, then require verification commands and a receipt. The citable artifact is the review receipt: it records the Codex surface, version, instructions used, MCP access, findings, test results, and remaining human questions.

  • Can Codex replace a human code reviewer?

    No, Codex should not be the final reviewer for production changes. It is best used as a fast second reader that checks consistency, missing tests, edge cases, and repo-specific rules before a maintainer approves the PR.

  • What belongs in AGENTS.md for code review?

    Put durable rules that affect review decisions: architecture boundaries, risky files, testing expectations, migration rules, security constraints, and package-specific conventions. Keep task-specific requests out of AGENTS.md; those belong in the Codex prompt or pull request context.

  • When should a Codex code review use MCP?

    Use MCP when the review needs approved external context, such as a GitHub issue, design note, API contract, or internal documentation page. Keep access read-only where possible, list the sources in the receipt, and avoid connecting broad systems that do not change the review decision.

  • What makes the review loop production-ready?

    A production-ready loop is repeatable, scoped, and auditable. It has AGENTS.md instructions, a small review prompt, bounded MCP access, verification commands, and a receipt that a teammate can read in under two minutes.

Further reading

Start with one reviewed PR

Pick one low-risk PR this week and run the receipt end to end. Once the habit feels boring, add it to your team template and make Codex review part of the normal merge path.

One methodology lens

One useful way to read this through our methodology is the Plan step: delegate first-pass decomposition and dependency mapping, review the sequencing and assumptions, and keep ownership of scope and priorities. If that split is still fuzzy, the workflow usually is too.

Related training topics

Related research

Ready to start?

Transform how your team builds software.

Get in touch