Back to Research

Codex CLI, GitHub, and MCP

A Codex CLI workflow guide for codex github, AGENTS.md, MCP, and reviewable diffs in production repos.

Construction of an Elevated Railway: Bridge over the Cours de Vincennes, landscape painting by Paul Désiré Trouillebert (1888).
Rogier MullerMay 21, 20267 min read

Codex CLI works best when you give it three things up front: a place to read your repo rules, a clear boundary for what it can touch, and a command that proves each change. Codex CLI is OpenAI's coding agent that runs in your terminal, reads the repo, edits files, and runs commands. The fast version of this workflow is the boring one, where the guardrails are written down before the first task.

Most teams do not stall because they need another agent. They stall because the work is hard to review, hard to verify, and hard to keep inside repo rules when the clock is loud. The good news is that Codex makes its workflow surface visible: it reads repo instructions, automates from the CLI, and sits next to GitHub and MCP integrations. That means you can standardize it, as long as you do the standardizing first.

Give Codex the repo rules with AGENTS.md

Codex reads an AGENTS.md file to learn how your repo wants to be worked on. Keep one at the root for repo-wide rules, then add nested AGENTS.md files in subdirectories where the local rules differ. The agent reads the same constraints your humans read, so review gets easier.

Keep each rule short. One line for architecture, one for what tests you expect, one for the areas that are unsafe to touch. Let a nested file carry the local exceptions so the root file stays readable.

# AGENTS.md (repo root)

## Architecture
- API handlers live in `src/api`. Shared logic goes in `src/lib`, never in handlers.

## Tests
- Run `pnpm test` before finishing. New behavior needs a test.

## Do not touch
- Generated files in `src/db/migrations`. Ask before editing auth code.

The whole point is discoverability. A rule the agent cannot find is a rule that does not exist, so write it where Codex actually looks.

Run a verification loop, not a chat

The habit that changes outcomes is the verification loop: ask for a change, run the checks, read the diff, then ask for the next pass. Every Codex task should end with a command, a test, or a build step that proves the change in the repo. That moves the conversation from intent to evidence.

A good first line for any task is one question: what must be true when this is done? If you can answer that with a command, you have a loop. If you cannot, the task is too vague to hand off yet.

Codex supports CLI automation and slash commands for exactly this kind of repeatable flow. Use them to wire the check into the task instead of running it from memory afterward.

Set an MCP boundary before the first task

MCP, the Model Context Protocol, is the connector boundary between Codex and outside systems like GitHub, Slack, or a database. It is a boundary, not a free pass to everything in the company. The official docs place MCP next to integrations and permissions on purpose: scope changes risk.

So write the boundary down before you start. List which systems Codex may touch, which actions are read-only, and which actions need a human review first. This is where Codex GitHub workflows get real. The agent can open, edit, and verify code, while the team still decides how far its external reach goes.

System Read Write Needs review
GitHub repo yes yes merges to main
Slack yes no any post
Production DB no no all access

Adjust the rows to your stack. The value is in having the table at all, because it turns "what can the agent do" from a guess into a written answer.

Hand reviewers a story, not just a patch

A clean patch is not the same as a reviewable one. When Codex changes code, tests, and config in one pass, the reviewer needs a short handoff: what changed, why, and how it was checked. Keep it to three lines in the PR description.

Goal: cache user lookups to cut p95 on /profile.
Files: src/lib/users.ts, src/lib/cache.ts, users.test.ts.
Checks: pnpm test passed, p95 dropped from 240ms to 90ms locally.

That receipt lowers review time and leaves a paper trail for the next person who asks why the change exists. When a repo task keeps coming back, go one step further and capture the repeated steps as a reusable skill or saved command instead of retyping the brief. That is how a personal habit becomes a team workflow.

A copyable checklist

Paste this into your repo as a starting point for a Codex CLI setup.

# Codex CLI operating checklist

- [ ] Root `AGENTS.md` exists and names repo-wide rules.
- [ ] Nested `AGENTS.md` files cover local exceptions.
- [ ] Every Codex task ends with a verification command.
- [ ] The PR note says goal, files, and checks run.
- [ ] MCP access is listed with read/write boundaries.
- [ ] Repeated tasks are captured as a skill or saved command.
- [ ] Reviewers can reproduce the result from the repo alone.

More structure means a little more setup and a lot less drift later. If your repo is tiny, a full instruction chain may feel heavy. If your repo is real, the structure pays for itself the first time a reviewer does not have to guess.

Common questions

  • What is AGENTS.md and where does it go?

    AGENTS.md is a plain markdown file that tells Codex how to work in your repo. Put one at the root for repo-wide rules, and add nested AGENTS.md files in subfolders where local rules differ. Codex reads them automatically, so the file is the place to encode architecture norms, test expectations, and areas that are unsafe to touch.

  • Do I need MCP to use Codex CLI?

    No. Codex CLI can read, edit, and run commands in your repo without any MCP servers connected. You add MCP when you want Codex to reach outside systems like GitHub, Slack, or a database. When you do, write a short boundary note first that says which systems it may touch and which actions need review.

  • How do I keep Codex diffs reviewable?

    End every task with a verification command and a three-line PR note covering goal, files touched, and checks run. The command proves the change works, and the note gives the reviewer the story behind the patch. Reviewers should be able to reproduce the result from the repo alone, without needing to reconstruct your prompts.

  • How does Codex CLI work with GitHub?

    Codex can open, edit, and verify code in a GitHub repo, then hand you a diff to review and merge. Treat merge permissions as a boundary you control, not something the agent decides. Keep GitHub write access scoped in your MCP note, and require human review on merges to main so the team stays in the loop.

  • Should I turn repeated tasks into skills?

    Yes, once a task shows up more than twice. Capture the repeated steps as a shared skill or saved command, keep the description sharp, and let Codex invoke it when the task matches. This stops people from retyping the same brief and makes the workflow consistent across the team instead of living in one person's head.

Start with one repo

Pick one repo, add an AGENTS.md and the checklist above, and run a single task through the full loop. Move the same setup into the next service only after reviewers can trust the first one. For more on this, see our Codex CLI workflows topic and how we run training on it.

Further reading

Related training topics

Related research

Ready to start?

Transform how your team builds software.

Get in touch