Back to Research

A Practical AI Review Workflow

A Codex-first convention for aligning AI code assistants, MCP boundaries, and review guardrails across a team.

Hamburg, Kunsthalle, Théodore Rousseau, Lichtung beim Dorf Pierrefonds, landscape painting by Théodore Rousseau.
Rogier MullerJune 29, 20268 min read

Align a split engineering team by agreeing on where AI may act, what evidence every AI-assisted change must include, and who can approve exceptions. Do that before standardizing on a favorite assistant, because the workflow should survive OpenAI Codex, OpenAI's coding agent; Codex, Anysphere's AI code editor; and Claude Code, Anthropic's coding agent.

An AI code review workflow for teams is a shared convention for prompting, changing, verifying, and reviewing code when a coding agent helped. Good AI coding training for teams turns that convention into daily practice, not a slide deck.

Start with one shared rule, not one favorite assistant

Write the team rule in the repo before you debate which assistant is best. The rule should say what the agent may change, what it must not change without approval, and what proof belongs in the pull request.

This matters because engineering team AI adoption usually fails at the seams. One developer uses Codex for tests, another uses Codex Agent for refactors, and a third refuses all generated code because the review burden feels unclear.

The trap is making this a tool preference fight. Keep the convention cross-tool, then add product-specific notes for Codex CLI commands, Codex rules, or Claude Code memory only where they change behavior.

A useful starting point is the governance layer in the related training topic: team rules first, tool behavior second, review evidence always.

Put tool boundaries where the agent reads them

For Codex users, put durable repo rules in AGENTS.md. Use it for architecture constraints, unsafe areas, required commands, review expectations, and MCP boundaries.

MCP is the Model Context Protocol, an integration layer that lets AI tools connect to external systems such as GitHub, Slack, databases, design files, and internal knowledge bases. That power is useful, but it also means the repo needs a clear boundary between read-only context, write actions, and secrets.

A concrete AGENTS.md note might say: “The GitHub MCP server may read issues and pull requests. Do not create branches, approve reviews, modify labels, or post comments unless the task explicitly asks for it.”

The trap is hiding this in onboarding docs that the agent never sees. Humans may remember a policy page; coding agents follow the context you actually give them.

Test MCP behavior without asking a model to guess

Treat MCP integrations like production dependencies. Write fixtures for the tool calls you expect, record the allowed inputs and outputs, and test the server without relying on a live model to do the right thing.

That is why the Ocarina signal is interesting: it points at automating and testing MCP servers from YAML with no LLM in the loop. Even if you do not adopt that project, copy the idea: fixtures first, agent second.

For example, a GitHub MCP test can assert that list_pull_requests returns only public metadata needed for review, while merge_pull_request is unavailable in the agent’s configured environment. That gives reviewers something firmer than “the assistant probably will not do that.”

The trap is treating MCP as harmless because it feels like context. An integration that can read private data, mutate tickets, or trigger deployments belongs in your AI coding governance review.

Keep verification close to the pull request

Require every AI-assisted change to include the exact verification loop the author ran. For Codex CLI work, that can be as simple as npm test, npm run lint, a targeted unit test, and a short note explaining what the agent changed.

This matters because reviewers should not have to reverse-engineer whether the author inspected the result. The PR should make the human review path obvious: intent, agent scope, files touched, tests run, risks left open.

A small hook can help. Teams often add a pre-push hook that runs the fastest safe checks, then ask the author to paste the full command output summary into the PR when Codex or another coding agent touched behavior.

The trap is asking the assistant to “review its own work” and stopping there. Agent review can catch issues, but it is not a substitute for deterministic checks and human ownership.

For a narrower rubric on reviewer behavior, see Review Rules for AI Coding Agents.

Paste this convention into AGENTS.md

Use this as a starter convention. Keep it short enough that people will actually follow it.

# AI-assisted change convention

This repository allows AI-assisted coding with Codex or another approved coding assistant when the human author remains responsible for the change.

## Scope

- AI may draft code, tests, docs, migrations, and refactors inside the requested task scope.
- AI must not change authentication, authorization, billing, data retention, encryption, deployment, or production configuration without explicit reviewer approval.
- AI must not introduce new dependencies, external services, background jobs, or MCP write actions without calling them out in the pull request.

## MCP boundaries

- MCP servers may be used for read-only context unless the task explicitly allows writes.
- GitHub MCP may read issues, pull requests, checks, and file metadata.
- Do not approve reviews, merge pull requests, modify labels, post comments, rotate secrets, or trigger deployments through MCP unless the task says so.
- Never paste secrets, customer data, private keys, or unredacted production logs into a prompt.

## Required verification

Before opening a pull request, run the smallest reliable verification loop:

- Formatter: `<command>`
- Linter: `<command>`
- Unit tests: `<command>`
- Targeted integration test, if behavior crosses a boundary: `<command>`

If a command cannot run locally, explain why in the PR.

## Pull request checklist

- [ ] I state where AI helped: planning, code, tests, docs, review, or debugging.
- [ ] I reviewed the generated diff line by line.
- [ ] I checked that the change stays inside the requested scope.
- [ ] I list the verification commands I ran and summarize the result.
- [ ] I call out risky files, skipped checks, or assumptions.
- [ ] I confirm no secrets, customer data, or private logs were pasted into an AI tool.
- [ ] I note any MCP server used and whether it was read-only or write-capable.

## Reviewer rule

Review the code, not the confidence of the assistant. If the PR lacks scope, verification, or MCP disclosure, request changes before reviewing design details.

Adopt it like any other engineering convention. One engineer proposes the first version, two reviewers from different parts of the codebase edit it, and the final text lands in the root AGENTS.md with narrower rules in nested AGENTS.md files where needed.

Then enforce one review rule every time: no AI-assisted PR merges without the checklist filled in. This is the lightest useful code review guardrail because it changes reviewer behavior without banning developer productivity gains.

Common questions

  • Our engineering team is split on using AI code assistants. How do we align them?

    Align them by separating tool choice from team obligations. Let developers use approved assistants, but require the same scope limits, verification evidence, MCP disclosure, and human review for every AI-assisted PR. Start with one AGENTS.md convention and revisit it after two weeks of real pull requests.

  • What should an AI code review workflow for teams include?

    It should include scope rules, tool boundaries, deterministic checks, PR disclosure, and a reviewer stop rule. The copyable checklist above is the minimum useful artifact: it tells authors what to prove and gives reviewers a clear reason to request changes when evidence is missing.

  • Should we allow MCP servers in coding-agent workflows?

    Yes, but start read-only and promote write access deliberately. MCP can make Codex and other coding agents much more useful by connecting them to issues, docs, and repositories, but every server should have an owner, allowed actions, and at least one test or fixture for risky behavior.

  • Do we need different rules for Codex, Codex, and Claude Code?

    You need one shared convention plus small tool-specific files. Keep repo-wide behavior in AGENTS.md, add Codex or Claude Code memory where those tools read it, and avoid maintaining three conflicting policies. The durable rule is the same: agent help is allowed, but review evidence is required.

  • Will this slow down senior engineers?

    A lightweight workflow should slow only unsafe changes. Senior engineers usually move faster when the convention removes review ambiguity: they know what to disclose, reviewers know what to check, and the team stops relitigating AI usage in every pull request. Keep the checklist under one screen.

Further reading

Make the first week small

Pick one active repo, add the convention to AGENTS.md, and require the checklist on AI-assisted PRs for one week. After five real reviews, tighten the rule where reviewers still had to guess.

One methodology lens

One useful way to read this through our methodology is the Plan step: delegate first-pass decomposition and dependency mapping, review the sequencing and assumptions, and keep ownership of scope and priorities. If that split is still fuzzy, the workflow usually is too.

Related training topics

Related research

Ready to start?

Transform how your team builds software.

Get in touch