Back to Research

Codex CLI 0.124.0: tighter rollback loops

Codex CLI 0.124.0 as a workflow moment: shrink the rollback contract, pin the model, and keep a connector roster and done checklist where reviewers live.

View of Saint-Cloud, Near the Seine, landscape painting by Alexandre Pau de St. Martin (1809).
Rogier MullerMay 3, 20265 min read

During rollback rehearsals, Codex review threads kept turning into archaeology: scrolling chat to learn why the agent touched a file. We ran the drill on Codex CLI 0.124.0 and stopped blaming the version, because the version was never the problem. A rollback contract is the short repo note that tells a reviewer what to undo and in what order, before the next agent run. The fix is not a longer prompt; it is a contract small enough to read.

Archaeology hour

Counter-thesis: tighter loops do not ship in the changelog; they come from a rollback contract a reviewer can read in one sitting.

The wrong path: We believed reviewers would absorb implicit intent. They did not, connectors multiplied faster than ownership maps, and intent kept living in chat where a rollback cannot reach it.

Diagnosis: the XY problem. We kept asking for better prompts when the actual problem was the missing rollback contract, so every improvement optimized the wrong layer.

Thesis: Codex CLI loops win when AGENTS.md owns truth.

The moment of truth is always the same question: why did the agent touch this file? If the answer lives only in chat, the loop is loose, whatever the Codex CLI docs version says.

Contract changes

Headless vs browser drift undoes rollbacks first. Chrome workflows diverge from CLI habits, reviewers see two truths, and dual rails need explicit handoff language. Named fix: Browser bridge note. Document staging URLs and credential boundaries beside browser tasks. Demos stop contradicting CI artifacts.

Model mismatch makes the undo plan ambiguous, because different models imply different risk appetite. Named fix: Model pin note. Pin the default model and the escalation rule inside AGENTS.md. Leads can reason about blast radius while everything else is in motion.

MCP privilege creep widens what a rollback has to cover. Connectors accumulate quietly, each server expands the blast radius, and least-privilege erodes. Named fix: Connector roster. Keep a Markdown roster checked into the repo root, using the MCP specification as shared language. Security reviews start grounded.

AGENTS spaghetti hides the definition of done that an undo has to restore. AGENTS.md grows unchecked and Codex optimizes the wrong "done"; ambiguity hides in length. Named fix: Done checklist. Definition of Done bullets, ten lines or fewer, at the top of the file. Output aligns with team vocabulary.

# AGENTS.md verification snippet

- Every Codex CLI run ends with the transcript snippet reviewers can replay.
- Pair browser evidence with the project's normal CLI checks before merge.
- If MCP servers are enabled, list allowed actions beside each connector name.

In our methodology this belongs to Document before it ever reaches Review: the handoff has to survive without the original operator in the room. The drill repo for that habit is Codex CLI workflows, and the Codex GPT-5.5 note runs the same contract against a model swap instead of a version bump.

Gate Question
Receipt match Does the PR body list scopes + verification transcript?
Rules precedence Which AGENTS.md or SKILL.md rule governed behavior?
Connector truth Which MCP servers fired, and were they expected?
Reviewer path Can someone unfamiliar trace intent without chat replay?

The reviewer handoff in four checks:

  • Scopes in the PR body match folders in the diff.
  • Primary-doc links were smoke-checked after publishing edits.
  • MCP connectors mentioned (if any) list owners.
  • Verification command output is pasted or linked.

Start from the quickstart if the team is new; watch the features page, the slash commands reference, and openai/skills for what each release adds. None of it replaces architecture judgement: agents accelerate execution, not ownership.

One image: a rollback contract is the fire exit map taped by the door. Nobody studies it during the fire, which is exactly why it has to be short.

Best ways to use this research

  • Best for: teams whose last rollback required scrolling chat history to learn what the agent had touched and in what order.
  • Best first artifact: the done checklist; ten lines at the top of AGENTS.md gives every undo a target state.
  • Best comparison angle: read your most recent agent-assisted PR against the four gates and time how long the reviewer path question takes to answer.

Common questions

  • Does codex cli 0.124.0 tighten review loops on its own?

    No. Tighter loops come from the rollback contract, not the changelog: a short repo note saying what to undo and in what order, a pinned model, a connector roster, and a done checklist at the top of AGENTS.md. The version bump just sets the stage for the rehearsal.

  • What belongs in a rollback contract for Codex runs?

    The shortest set a reviewer can act on: which scopes the run touched, the verification command output, which MCP connectors fired, and who approved red-folder paths. If a reviewer has to replay chat to answer any of those, the contract is not finished yet.

  • Why pin the model inside AGENTS.md during upgrades?

    Upgrade weeks invite casual model swaps, and casual swaps make review expectations wobble because different models imply different risk appetite. The model pin note fixes the default and the escalation rule, so leads can reason about blast radius while everything else is changing.

  • How do MCP connectors complicate a rollback?

    Each connector expands the blast radius a rollback has to cover, and connectors accumulate quietly until least-privilege has eroded. A Markdown roster checked into the repo root names every server and its owner, which is what lets the undo plan start grounded.

Next move

If your review threads read like archaeology, tell us what the last rollback cost. We will point at the contract that would have caught it.

Related training topics

Related research

Continue through the research archive

Ready to start?

Transform how your team builds software.

Get in touch