Codex CLI workspace tools that make runs reviewable
Codex CLI workspace tools that carry the operating contract: model pin, connector roster, done checklist, and slash catalog for reviewable agent runs.

When a Codex run feels untrustworthy, the fix usually lives in your repo, not in a better model. Codex CLI workspace tools are the checked-in files that tell the agent, and your reviewers, what a run was allowed to do: a model pin, a connector roster, a done checklist, and a slash command catalog. Codex CLI is OpenAI's coding agent, and like any agent it improvises whenever the rules are missing. The job of these files is to make the rules sit in the repo where a teammate can read them six weeks later.
Here's the failure mode they prevent. A change is ready to merge, then someone asks which model produced it and which connectors fired. If the answer lives only in a chat session someone has to reconstruct from memory, the PR stalls. Move the contract into the repo and that question has a one-line answer.
Pin the model so review knows what normal means
Swap default models mid-sprint and your review expectations drift with every run. A stronger model can paper over a weak protocol, and a weaker one can make a perfectly good protocol look flaky. Neither tells you anything useful.
So write the default down. Put the model and the escalation rule in AGENTS.md, near the top, in one place. Now "normal" is a stated fact before anyone starts a run, and "we escalated" is a deliberate, visible choice rather than a quiet drift.
Keep a connector roster at the repo root
Connectors pile up quietly. Someone wires one in for a one-off task, it stays, and a month later nobody remembers what it can touch. The risk isn't tool sprawl so much as least-privilege quietly eroding.
A checked-in roster fixes this. List each connector and the actions it's allowed to take, right at the repo root. That's the wiring the MCP specification describes at the protocol level, made concrete for your repo. Security review then gets a surface to read instead of a memory test about who enabled what.
Cap AGENTS.md with a definition of done
AGENTS.md turns into a junk drawer the moment it tries to explain everything. Once it's long enough, Codex starts optimizing against the wrong idea of "done," and your reviewers lose track of what the run was supposed to satisfy.
Keep a Definition of Done block near the top, ten bullets or fewer. Completion criteria stay visible before the agent starts branching off. Here's a small verification snippet you can paste straight in and trim to taste:
# AGENTS.md verification snippet
- Every Codex CLI run ends with the transcript snippet reviewers can replay.
- Pair browser evidence with the project's normal CLI checks before merge.
- If MCP servers are enabled, list allowed actions beside each connector name.
- State the default model and the escalation rule in one place.
- Link the command catalog from the workspace root.
Catalog your slash commands
Undocumented slash commands quietly split your team. One person assumes a command is safe and shared, another assumes it's local-only, and they're both confident. That gap shows up at the worst time.
Write them down. Keep a catalog in docs/codex-commands.md, link it from AGENTS.md, and check it against the official slash commands reference. Command usage becomes a repo artifact anyone can read, not tribal memory.
Run the review gate before you merge
Once the four files exist, you have a short gate to walk every agent PR through. The point is simple: can the repo answer these questions without the original operator in the room?
| Gate | Question |
|---|---|
| Replay proof | Which commands prove the change is safe to merge? |
| Receipt match | Does the PR body list scope and the verification transcript? |
| Rules precedence | Which .mdc, SKILL.md, or CLAUDE.md file governed the run? |
| Connector truth | Which MCP servers fired, and were they expected? |
| Model pin | Is the default model named, and is escalation explicit? |
The Codex CLI features page documents what runs can do; your workspace decides what they may do. New teams can stand up the binary from the Codex quickstart and track releases in the openai/codex repository. None of those will write the contract for you. That part is on the repo, and it's the part that survives a reviewer who wasn't there.
A couple of honest limits. These files don't replace human judgment on threat models, customer commitments, or blast radius; they make those calls easier to audit, not easier to hand off. And a stale roster or catalog is worse than none, because it reads as truth while lying. Keep the artifact in the repo, where a diff catches drift, not in a slide deck.
Common questions
-
What are Codex CLI workspace tools?
They're the repo files that carry your team's operating contract for agent runs: a model pin and escalation rule in
AGENTS.md, a connector roster with allowed actions at the repo root, a ten-bullet definition of done, and a slash command catalog indocs/codex-commands.mdlinked from the workspace root. Together they let a reviewer reconstruct what a run was allowed to do. -
Why pin the default model in AGENTS.md?
Because casual model swaps make review expectations wobble. A stronger model can hide a weak protocol, and a weaker one can make a good protocol look unreliable, so neither swap is informative. The pin states what normal looks like, and the escalation rule says, in the open, when a run is allowed to deviate from it.
-
Where should the MCP connector roster live?
Checked into the repo root, with allowed actions listed beside each connector name. Connectors accumulate quietly and least-privilege erodes with each addition, so the roster hands security review a concrete surface to read. That beats a memory test about who enabled which connector, when, and for what one-off task.
-
How long should AGENTS.md get?
Short enough that the definition of done stays visible: a block of ten bullets or fewer near the top. Once the file tries to explain everything, Codex starts optimizing against the wrong version of done, and reviewers lose the completion criteria the run was meant to satisfy. When in doubt, link out to a longer doc instead of inlining it.
Where to start
Pick one of the four files, model pin, roster, done checklist, or catalog, and commit it before your next automated run. The drill library lives under CLI workflows when you're ready to wire up the rest.
Related training topics
Related research

Codex mobile CLI docs your team can read anywhere
The codex mobile cli question is a docs question: how a team keeps AGENTS.md rules, run notes, and verification transcripts readable away from the desk.

Codex-auto-review: what it catches and misses
Codex-auto-review trials showed Codex catching syntax drift and missing permission drift. The fix is transcript evidence and repo contracts, not more autonomy.

Codex CLI 0.122.0: workflows, permissions, MCP
A Codex CLI 0.122.0 workflow guide: AGENTS.md instructions, permission boundaries, MCP rosters, and verification reviewers can replay.
Continue through the research archive
Newer research
Codex CLI 0.121.0 for repo workflows
Codex CLI 0.121.0 repo workflows: named connector owners, a pinned model in AGENTS.md, and PR receipts that survive reviewer handoffs.
Earlier research
Codex CLI 0.122.0: workflows, permissions, MCP
A Codex CLI 0.122.0 workflow guide: AGENTS.md instructions, permission boundaries, MCP rosters, and verification reviewers can replay.