Back to Research

Codex workflows: governance that lives in the repo

How to govern codex workflows from the repo: a connector roster, a ten-line done checklist, a slash catalog, and a verification latch reviewers can replay.

Twilight in the Wilderness, landscape painting by Frederic Edwin Church (1860).
Rogier MullerMay 14, 20266 min read

Govern your Codex workflows by writing the rules into the repo, not into chat history. A governed workflow is one where the boundaries live in AGENTS.md, the tool access lives in a connector roster, and every run ends with verification a reviewer can replay. Codex CLI, OpenAI's coding agent, makes a laptop productive in an afternoon. It does nothing to answer the question that comes up later: who owns this merge?

That question is the whole game. Onboarding docs tell you how to run the agent. Governance tells you who is accountable for what it produces. Reviews run on the second one, so that is what this piece sets up.

Put the rules where the agent reads them

The Codex quickstart gets you to your first run fast, and the Codex CLI docs describe everything the tool can do. None of it decides what the tool may do in your repo. That decision belongs to you, and the place to record it is AGENTS.md at the repo root.

AGENTS.md is the file Codex reads to learn your project's rules. Keep it short and it stays true. Let it grow a line per incident and it turns into something nobody recognizes, where the agent optimizes a definition of "done" that drifted away from your team's.

So lead with a Definition of Done in ten lines or fewer, at the very top of the file. Everything below it is detail. The first ten lines are the contract.

Name every connector and its owner

The quietest way a Codex setup goes wrong is connector sprawl. MCP servers get added one Slack thread at a time, each one widens what the agent can touch, and least-privilege erodes without anyone deciding to weaken it.

MCP is the Model Context Protocol, the shared interface Codex uses to talk to external tools and data. The fix is a roster: a Markdown file at the repo root that names every server, its owner, and its allowed actions. Use the MCP specification as the shared vocabulary so a security review starts from a list instead of starting from archaeology.

When the roster exists, the answer to "which connector wrote this?" is a file lookup, not a scroll through chat.

End every run with replayable proof

The expensive failure is verification bypass: an exec shortcut skips the tests, a regression slips back in quietly, and speed wins once where discipline would have won every week after. The latch is a standing rule, not a polite request. Every codegen run ends with a transcript snippet showing the tests actually ran.

Drop this into AGENTS.md so the rule travels with the repo:

# AGENTS.md verification snippet

- Every Codex CLI run ends with the transcript snippet reviewers can replay.
- Pair browser evidence with the project's normal CLI checks before merge.
- If MCP servers are enabled, list allowed actions beside each connector name.

One more habit prevents the slow drift between operators: a slash catalog. The slash commands reference covers the built-ins, but nobody ships docs for the commands your team invented. Keep a docs/codex-commands.md, link it from AGENTS.md, and two people stop running the same task two different ways.

Give reviewers four answers up front

A reviewer should be able to trust an agent-assisted merge without replaying the session. Four questions get them there, and the repo should answer all four on its own:

Gate Question
Connector truth Which MCP servers fired, and were they expected?
Reviewer path Can someone unfamiliar trace intent without chat replay?
Risk routing Were red folders touched, and who approved?
Replay proof Which commands prove regression guards?

Then make the pull request carry the receipts. Paste this checklist into your PR template:

  • MCP connectors mentioned (if any) list owners.
  • Verification command output is pasted or linked.
  • Forked agent work lists parent and child responsibilities.
  • Red-folder paths received explicit human acknowledgement.

Releases in openai/codex will keep changing what the CLI can do, and the features page will keep growing. None of that changes who owns the merge. Agents speed up execution. Ownership and architecture judgement stay with a person.

Common questions

  • Who owns truth in a governed Codex workflow?

    AGENTS.md owns truth, and everything else hangs off that rule. Four guardrails keep it honest: a connector roster against MCP privilege creep, a ten-line done checklist against an AGENTS.md that sprawls, a slash catalog against operators diverging, and a verification latch against skipped tests. Lose AGENTS.md as the source of truth and the other three stop meaning anything.

  • Why keep the connector roster in the repo root?

    The repo root is where a security reviewer looks first and where the agent's working directory begins, so the roster is found without hunting. Connectors that accumulate quietly erode least-privilege, and each new MCP server widens the blast radius. A roster naming every server, its owner, and its allowed actions lets a review start grounded instead of guessing what fired.

  • How does the verification latch work day to day?

    Every codegen run ends with a transcript snippet showing the tests ran, required rather than requested. It only works as a standing rule, because speed wins once and discipline wins weekly. Once the latch is in place, a green merge starts correlating with an actual test run again, which is the whole point of pasting the proof.

  • What questions gate a governed merge?

    Four: which MCP servers fired and were they expected, can someone unfamiliar trace intent without chat replay, were red folders touched and who approved, and which commands prove the regression guards. The PR checklist backs each one with owners, pasted verification output, parent and child responsibilities for forked work, and explicit acknowledgement of red-folder paths.

  • Which guardrail should a team install first?

    Start with the connector roster. It takes one sitting, and it turns your next security review from archaeology into a checklist. From there, add the ten-line done checklist, then the slash catalog, then the verification latch, in roughly that order of effort.

Try it on one repo

Pick a single repo, hold the four gates against its last agent-assisted merge, and count how many a stranger could answer from the files alone. If you want to install all four guardrails with the whole team in the room, that live setup is what our training is built around: bring one repo, leave with the roster, checklist, catalog, and latch in place.

Related training topics

Related research

Continue through the research archive

Ready to start?

Transform how your team builds software.

Get in touch