Back to Research

Run Codex as an Engineering Team

Team workflow for Codex CLI, AGENTS.md, verification loops, and reviewable diffs after Codex Remote GA.

View of Rome from Tivoli, landscape painting by George Inness (1872).
Rogier MullerJune 28, 20269 min read

The best practice is to treat OpenAI Codex, OpenAI's coding agent, like a junior teammate with clear repo rules, small tasks, automated checks, and human review. Use Codex CLI for tight local work and Codex Remote for delegated work that should come back as a reviewable change, especially now that Codex Remote is generally available as of the June 25, 2026 OpenAI changelog.

Codex CLI is the command-line workflow for asking Codex to inspect, edit, and verify code inside a repository. The team win is not more prompts; it is a shared operating loop for instructions, execution, verification, and review. For a broader path through these patterns, start with the related training topic.

Start with team rules, not better prompts

Put durable engineering rules in AGENTS.md before you ask Codex to touch production code. This gives every Codex agent the same map: project shape, test commands, forbidden shortcuts, architecture boundaries, and review expectations.

A useful AGENTS.md is short enough to be read and strict enough to change behavior. In a Rails repo, the root file might say that billing code requires approval from the payments owner, while app/jobs/AGENTS.md says background jobs must be idempotent and covered by retry tests.

The trap is treating AGENTS.md as a dumping ground for every preference the team has ever debated. Keep durable rules there; put task-specific context in the Codex prompt or issue. For a deeper pattern, see AGENTS.md for Codex Teams.

Running OpenAI Codex with an engineering team works best when these rules are versioned, reviewed, and owned like code. That is the quiet governance habit that makes engineering team AI adoption less weird: the team reviews the operating model, not just the generated diff.

Choose the right Codex loop for the job

Use Codex CLI when the engineer is actively steering the work. It fits refactors, failing-test loops, migrations, and codebase exploration where the human wants to inspect each step.

Use Codex Remote when the work can be delegated and reviewed later. The official changelog matters here because general availability changes the planning question from whether the workflow is experimental to where it belongs in the team system.

Criteria Codex CLI local loop Codex Remote delegated loop
Best fit Active pairing in a local repo, especially when the engineer wants to run checks and refine quickly Ticket-shaped work that can be handed off and reviewed as a completed change
Instruction surface AGENTS.md, prompt context, slash commands, local repo state Same team rules, but packaged as a clearer task because the agent is less conversational
Verification habit Run tests, linters, type checks, and targeted commands before opening a PR Require Codex to report what it changed, what it ran, and what still needs human judgment
Review shape Small diffs reviewed while the author still remembers the reasoning Reviewable diffs with explicit scope, risks, and rollback notes
Main risk Letting an interactive session drift into a giant unreviewable patch Delegating vague work that returns a plausible but hard-to-audit change

Verdict: Codex CLI wins for close collaboration, debugging, and learning a codebase; Codex Remote wins for bounded implementation tasks that already have crisp acceptance criteria. Mature Codex workflows use both, with the same AGENTS.md rules and the same review bar.

Make every Codex task reviewable

Give Codex a small unit of work and a definition of done. A good task is not fix auth; it is make password reset tokens expire after 30 minutes, add tests for expired and valid tokens, and do not change session storage.

Ask for a plan before code when the blast radius is unclear. Then ask Codex to implement one step at a time, run the relevant checks, and summarize the diff in reviewer language.

The trap is accepting a large diff because it looks coherent. A Codex-generated change should pass the same review gates as a human change: tests, types, security concerns, migrations, observability, and rollback. If your team skips those gates for AI output, the problem is not Codex; the problem is the process around it.

A simple review checklist helps. Require the PR description to include the prompt or task link, files changed, commands run, commands not run, and follow-up work. That gives reviewers enough context without asking them to replay the whole session.

Keep MCP and skills boring

Model Context Protocol, or MCP, is the integration layer that lets coding tools connect to external systems such as GitHub, Slack, document stores, databases, issue trackers, and private knowledge bases. Treat Codex MCP access like production access, not convenience glue.

Start with read-only integrations unless the team has a clear write path and audit trail. A safe MCP boundary note might say Codex may read issue descriptions and design docs, but must not create tickets, change labels, or write to production databases.

Skills are best used for repeatable team knowledge: migration playbooks, release checklists, debugging recipes, and repo-specific scripts. The activation text matters because Codex needs to know when a skill applies. The trap is creating a pile of clever skills nobody trusts because they are stale, unowned, or too broad.

For Codex training, pair each integration with one allowed workflow. A codex cli workshop should not try to teach every feature in one sitting; it should teach one safe loop that the team can repeat on Monday.

Paste this operating checklist into your repo

Use this as a starter artifact for your first team rollout. Put the AGENTS.md items in the repo, then paste the workflow and review checklist into your team handbook or PR template.

# Codex team operating checklist

## Repository instructions
- [ ] Root AGENTS.md explains project layout, package manager, test commands, and review rules.
- [ ] Nested AGENTS.md files exist for risky areas such as payments, auth, data migrations, and background jobs.
- [ ] Rules are durable. Task context stays in the issue, prompt, or PR description.
- [ ] Owners are named for sensitive domains.

## Task intake
- [ ] The Codex task has a one-sentence goal.
- [ ] The task has explicit non-goals.
- [ ] The expected files or modules are named when known.
- [ ] The definition of done includes tests, docs, or migration notes.
- [ ] Risk level is marked: low, medium, or high.

## Codex CLI loop
- [ ] Ask Codex to inspect before editing when the code path is unfamiliar.
- [ ] Ask for a short plan before changes with medium or high risk.
- [ ] Keep the diff small enough for one human review.
- [ ] Run the narrowest useful verification first.
- [ ] Run the full required check before opening the PR when practical.

## Verification
- [ ] Unit tests run: record command and result.
- [ ] Type check or lint runs: record command and result.
- [ ] Manual check runs when behavior is user-visible.
- [ ] Any skipped check is listed with a reason.
- [ ] New failure is either fixed or called out clearly.

## MCP and external systems
- [ ] Codex has only the external access needed for this task.
- [ ] Read-only access is preferred by default.
- [ ] Write actions require explicit human approval.
- [ ] Production data is not copied into prompts or logs.

## PR review
- [ ] PR links to the task or includes the original prompt.
- [ ] PR summary explains what changed and why.
- [ ] Reviewer can see commands Codex ran.
- [ ] Reviewer can see known limitations or follow-ups.
- [ ] Human owner approves before merge.

Do not try to perfect this checklist on day one. Run it on two real tasks, remove the parts nobody uses, and make the missing safety checks painfully obvious.

Common questions

  • What are best practices for running OpenAI Codex with an engineering team?

    Use shared AGENTS.md rules, small scoped tasks, Codex CLI verification loops, and normal human PR review. The citable operating model is four gates: instructions before work, plan before risky edits, checks before review, and reviewer approval before merge. Codex should make the diff easier to review, not exempt it from review.

  • Should we use Codex CLI or Codex Remote for team work?

    Use both, but use them for different jobs. Codex CLI is better when an engineer is actively pairing with the agent in a repo; Codex Remote is better for bounded delegated tasks that can return as a reviewable change. As of June 25, 2026, Codex Remote is listed as generally available in OpenAI's changelog.

  • What should go in AGENTS.md for Codex?

    Put durable repo rules in AGENTS.md: architecture constraints, test commands, package manager details, ownership boundaries, and review expectations. Keep it short enough that engineers will maintain it. A strong pattern is one root file for global rules, plus nested files for sensitive areas like auth, billing, migrations, and background workers.

  • How do we keep Codex MCP access safe?

    Limit MCP access to the systems and actions the task actually needs. Start read-only, require human approval for writes, and document boundaries near the workflow so engineers do not guess. One useful artifact is a short MCP boundary note listing allowed reads, forbidden writes, data handling rules, and the owner for exceptions.

  • How do we run a Codex CLI workshop without wasting a day?

    Run the workshop around one real repository task, not a tour of features. Give everyone the same AGENTS.md, one failing test or small improvement, and a fixed review checklist. A useful codex cli workshop ends with a merged or mergeable PR and one team rule you improve based on what happened.

Further reading

Run one real task this week

Pick a low-risk bug, add or tighten AGENTS.md, run the Codex CLI loop, and make the PR description show exactly what changed and what checks ran. Then decide what rule your team wants Codex to follow next time.

One methodology lens

One useful way to read this through our methodology is the Plan step: delegate first-pass decomposition and dependency mapping, review the sequencing and assumptions, and keep ownership of scope and priorities. If that split is still fuzzy, the workflow usually is too.

Related training topics

Related research

Ready to start?

Transform how your team builds software.

Get in touch