Back to Research

How to set up agentic coding workflows and guardrails

A field guide to agentic coding workflows and guardrails: handoff receipts, connector ownership, and review gates for engineering teams under deadline.

Château des environs de Paris, landscape painting by Paul Cézanne (1888).
Rogier MullerMay 15, 20266 min read

To set up agentic coding workflows and guardrails, make every agent handoff return receipts: the scope it worked on, the commands it ran, and proof the tests passed, all written into the PR before merge. A guardrail here is a repo-level rule that forces those receipts in, so a reviewer can check the work without replaying the chat. This works the same across Claude Code, Anthropic's coding agent, plus Codex CLI from OpenAI and Codex, Anysphere's AI code editor.

You will feel the gap in the PR comments first. Parent intent and child scope drift apart quietly, the deadline still holds, and nobody notices until the review queue backs up. That drift is the thing guardrails catch.

Spot the drift before metrics do

Parallel agents are not free parallelism. The part that breaks first is the reviewable story, not the model. A reviewer asks "why this approach?" and there is no written answer, because the summary shrank to a few bullet points during a busy week.

That is the early signal. When task summaries stop carrying intent, parent agents start green-lighting diffs they cannot explain, and the gap hides until the queue stalls.

The fix is boring on purpose: one written boundary per surface, checked at review. The teams that scale agents well are the ones whose handoffs come back explainable, not the ones with the most autonomy.

Write one boundary per surface

Each tool earns a guardrail the same way. You write down what is allowed, what is forbidden, and how a reviewer verifies it. Four boundaries cover most of what goes wrong.

Replay sandwich for Codex. Lean on Codex CLI and you will merge green checks no reviewer ever saw run. Have AGENTS.md require an intent line, then the command transcript, then a diff summary before the PR opens. Review becomes reproducible, and nobody has to stand behind a terminal.

Connector card for MCP. Servers on the Model Context Protocol ship as capability demos, so read the OWASP LLM Top 10 before wiring in more. Write one markdown card per server: allowed actions, forbidden actions, owner, rollback. Incidents shrink because operators know what "off" looks like.

Child receipt block for chained agents. Chained agents return summaries that drop the paths a child actually touched, and it turns into a game of telephone. Make every child return the paths it changed, the commands it ran, and the tests proving its regression guards. Parents stop approving mystery diffs.

Decision stub for review. CI is green and reviewers still ask why this approach, with no written answer waiting. Have the PR template force three lines: constraints considered, alternatives rejected, verification proof. The debate moves to explicit tradeoffs.

Here is a boundary snapshot you can drop in and adapt:

---
description: Delegation boundary snapshot (adapt globs to your repo)
globs:
  - "**/*"
alwaysApply: false
---

- Codex: keep scopes explicit in `.mdc`; forbid undeclared MCP domains.
- Claude Code: cite `CLAUDE.md` precedence before expanding bash scope.
- Codex: ensure `AGENTS.md` carries replay-friendly verification notes for CLI runs.

In our methodology this lives in Document before it reaches Review: the handoff has to survive without the original operator in the room. The full track is agentic coding governance, and the cloud-agent version of the argument is in Codex workspace agents need repo rules.

Make review the gate, not a rubber stamp

A guardrail that review cannot check is just a wish. Four questions turn it into a real gate, and you can paste them straight into a review template.

Gate Question
Replay proof Which commands prove the regression guards?
Receipt match Does the PR body list scopes plus the verification transcript?
Rules precedence Which .mdc, SKILL.md, or CLAUDE.md governed behavior?
Connector truth Which MCP servers fired, and were they expected?

If your repo cannot state its boundaries plainly, agents will guess, and guessing scales poorly. Tooling here is working language: if the repo cannot say "allowed" and "forbidden," neither can the agent.

Common questions

  • How do you set up agentic coding workflows and guardrails with Codex?

    Start with AGENTS.md. Make it require a replay sandwich: an intent line, the command transcript, and a diff summary before any PR opens. That turns Codex CLI runs into reviewable work, since commands that ran without narrative are impossible to verify after the fact. Receipts beat raw autonomy.

  • What guardrails do Claude Code and Codex need first?

    The first guardrail is a written boundary per surface. That means .mdc scopes for Codex, CLAUDE.md precedence for Claude Code, and replay-friendly verification notes in AGENTS.md for Codex. Then add one connector card per MCP server listing allowed actions, forbidden actions, owner, and rollback, so incidents have a clear off switch.

  • How do you keep chained agent handoffs reviewable?

    Require a child receipt block. Every child agent returns the paths it touched, the commands it ran, and the tests proving its regression guards. Parents stop approving mystery diffs, because summaries on their own collapse into a telephone game. It turns agent output back into work the team actually owns.

  • What stops review queue theater when CI is already green?

    A decision stub in the PR template. It forces three lines: constraints considered, alternatives rejected, and verification proof. Green CI tells you the code runs, not why it was built that way, and the stub gives reviewers the written answer they keep asking for. The debate moves to explicit tradeoffs.

  • Where should teams start if they want to practice this?

    Pick one fix and ship it this week. Turn a single named boundary into a shared checklist or repo rule before the next automated run, then add the next one once it sticks. If you want a guided version, the hands-on training walks teams through it on a real repo.

Start with one fix

Choose the boundary that hurts most right now, write it into your repo rules, and check it at the next review. The white paper has the full operating model your platform lead can take into a steering meeting.

Related training topics

Related research

Continue through the research archive

Ready to start?

Transform how your team builds software.

Get in touch