codex-cli 0.125.0: reviewable agent loops
An operational memo for codex-cli 0.125.0: reviewable agent loops, AGENTS.md pins, verification transcripts, and connector rosters.

A release is waiting on your verdict about codex-cli 0.125.0, and the version number is the least durable part of the answer: reviewability comes from the operating contract around the run, not from the changelog. A reviewable agent loop is a Codex run whose intent, scope, and verification can be traced without replaying the chat. This memo is governance hygiene, not hype cycles.
The evaluation underneath the upgrade
Counter-thesis: upgrading to codex cli 0.125.0 buys you capability; it buys you nothing reviewable.
The wrong path: We believed velocity would compound if we parallelized agent streams. We ran the experiment and the expensive bug was duplicated edits nobody reconciled, plus MCP calls that looked harmless until credentials entered the transcript.
Diagnosis: normalization of deviance, Diane Vaughan's term. Every skipped transcript that ends fine makes the next skip feel safer, until skipping is the standard and the team is borrowing confidence it never earned.
Thesis: Codex confidence tracks CLI transcripts, not screenshots.
If your squad trains engineers, the scarcest asset is not tokens; it is inspectable intent.
Where Codex drifts
Named fix: Verification latch. Exec shortcuts that skip tests let regressions return quietly, so require a transcript snippet showing tests ran after codegen. Speed wins once; discipline wins weekly. That is the move that makes review cheaper.
Named fix: Browser bridge note. Chrome workflows that diverge from CLI habits show reviewers two truths. Document staging URLs and credential boundaries beside browser tasks, and demos stop contradicting CI artifacts. Treat it as a workflow rule, not a preference.
Named fix: Model pin note. Teams that swap models casually wobble their own review expectations, because different models imply different risk appetite. Pin the default model and escalation rule inside AGENTS.md, and leads can reason about blast radius again.
Named fix: Connector roster. Connectors that accumulate quietly erode least-privilege, and each server expands the blast radius. Keep a Markdown roster checked into the repo root, mirroring the contract language of the MCP specification, and security reviews start grounded. Use it before adding another connector.
# AGENTS.md verification snippet
- Every Codex CLI run ends with the transcript snippet reviewers can replay.
- Pair browser evidence with the project's normal CLI checks before merge.
- If MCP servers are enabled, list allowed actions beside each connector name.
Teams anchor the habit in the Review step of our methodology: receipts meet responsibility there. The neighboring release note, Codex CLI 0.123.0: workflows that hold up, installs the same contract from the diff side, and the rehearsal drills live under CLI workflows.
Reviewer proof table
| Gate | Question |
|---|---|
| Reviewer path | Can someone unfamiliar trace intent without chat replay? |
| Risk routing | Were red folders touched, and who approved? |
| Replay proof | Which commands prove regression guards? |
| Receipt match | Does the PR body list scopes + verification transcript? |
Evidence checklist
- Primary-doc links were smoke-checked after publishing edits.
- MCP connectors mentioned (if any) list owners.
- Verification command output is pasted or linked.
- Forked agent work lists parent + child responsibilities.
Mechanics live elsewhere: the Codex CLI docs and features page describe the loop machinery, the slash commands reference names the verbs, the quickstart bootstraps a machine, and the openai/codex repository tracks releases. None of them sign your review. Architecture judgement stays human; agents accelerate execution, not ownership.
Synthesis: a screenshot is a photo of the demo; a transcript is the tape the incident review replays. Keep the tape.
Best ways to use this research
- Best for: Codex teams deciding which AGENTS.md instruction, CLI workflow, MCP boundary, or verification loop to standardize next.
- Best first artifact: turn one named fix into an AGENTS.md rule, verification checklist, MCP note, or review receipt before the next automated run.
- Best comparison angle: score your last three agent loops against the reviewer proof table; keep the loop with the shortest auditable trail.
Common questions
-
Does codex-cli 0.125.0 make agent loops reviewable on its own?
No. Reviewability comes from the operating contract around codex-cli 0.125.0, not from the release itself; the failure is rarely tool quality. The contract has four named parts here: a verification latch, a browser bridge note, a model pin inside AGENTS.md, and a connector roster in the repo root.
-
What is a reviewable agent loop?
A reviewable agent loop is a Codex run whose intent, scope, and verification can be traced without replaying the chat. The reviewer proof table makes it concrete: red folders need named approval, commands prove regression guards, and the PR body lists scopes plus the verification transcript.
-
Why do MCP connectors get a roster in this memo?
Because MCP calls look harmless until credentials enter the transcript, and connectors that accumulate quietly erode least-privilege. A Markdown roster checked into the repo root grounds security reviews, since each added server expands the blast radius a review eventually has to explain.
-
What does the verification latch require after codegen?
A transcript snippet showing tests ran, pasted where reviewers can replay it. Exec shortcuts that skip tests let regressions return quietly, and the mundane evidence is the point: green merges start correlating with the actual ritual instead of the team's borrowed confidence.
Next step
The white paper packages this operating contract end to end; read it before the next release evaluation lands on your desk.
Related training topics
Related research

Codex workflows: governance that lives in the repo
How to govern codex workflows from the repo: a connector roster, a ten-line done checklist, a slash catalog, and a verification latch reviewers can replay.

Codex-cli 0.130.0: workflows that survive the update
What codex-cli 0.130.0 means for production repos: the AGENTS.md boundaries, MCP permissions, and review receipts that hold across any Codex CLI release.

Codex CLI 0.123.0: workflows that hold up
Codex CLI 0.123.0 workflows that hold up in review: replay recipes in the diff, a pinned model, a connector roster, and a ten-line done checklist.
Continue through the research archive
Newer research
Eval platform governance for AI coding teams
A governance memo on eval platform governance: receipts behind scores, scoped harness access, and owners that stop Goodhart drift.
Earlier research
Agent boundaries for teams running coding agents
How to set agent boundaries for teams: connector ownership, written scopes, and review receipts that keep agent diffs explainable after the session ends.