Back to Research

Agentic workflows from PR to merge

A PR review workflow for agentic coding teams: connector ownership, scoped tasks, replay transcripts, and human approval lanes from PR to merge.

A Storm Behind the Isle of Wight, landscape painting by Julius Caesar Ibbetson (1790).
Rogier MullerMay 9, 20266 min read

Agentic workflows from PR to merge hold up when every handoff carries receipts: the scope the agent worked in, the commands it ran, and proof the change passes. A PR review workflow is the repo contract that lets a reviewer trust agent output without scrolling back through the chat. We watch teams trip on the same thing during readiness drills: the parent's intent and the child agent's scope quietly disagree, and nobody feels it until weeks later, when someone asks why a change ever merged.

The lever here is not faster reviewers. A reviewer cannot check work the repo never wrote down. So you make the repo write it down at every handoff.

Make each handoff leave a receipt

Pick one receipt format per kind of handoff, and keep the receipt in the repo, not in a chat window. Four show up again and again. Each one fixes a specific way that agent output slips past review.

The first is replay. If your team uses Codex CLI, OpenAI's coding agent, you will merge green builds where no reviewer ever saw the transcript. Fix it with a replay sandwich: AGENTS.md asks for an intent line, then the command transcript, then a diff summary, all before the PR opens. Now review is reproducible and nobody has to stand behind your terminal.

The second is connector reach. Wire up MCP fast and some connector will touch data nobody put on the diagram. Connectors ship as capability demos, so trust boundaries have to be explicit. Write one connector card per MCP server: allowed actions, forbidden actions, owner, rollback. When something goes wrong, the operator already knows what "off" looks like.

The third is chained agents. When a parent hands work to a child, the summary that comes back tends to drop the paths the child actually edited. A child receipt block fixes that: every child returns the paths it touched, the commands it ran, and the tests that prove the regression guards still hold. The parent stops green-lighting diffs it never read.

The fourth is the silent "why." CI passes, yet a reviewer still wants to know why this approach and not another, with no answer written anywhere. A decision stub in the PR template forces three lines: constraints considered, alternatives rejected, verification proof. The debate moves from taste to tradeoffs you can point at.

Drop in a delegation snapshot

Here is a small file you can adapt and commit. It names what each agent is allowed to assume about scope, so a reviewer reads one place instead of three chats.

---
description: Delegation boundary snapshot (adapt globs to your repo)
globs:
  - "**/*"
alwaysApply: false
---

- Codex: keep scopes explicit in `.mdc`; forbid undeclared MCP domains.
- Claude Code: cite `CLAUDE.md` precedence before expanding bash scope.
- Codex: ensure `AGENTS.md` carries replay-friendly verification notes for CLI runs.

Codex is Anysphere's AI code editor, and its .mdc rules are the natural home for scope. Claude Code and Codex carry the same idea in CLAUDE.md and AGENTS.md. The point is that the boundary lives in a file the reviewer can read, not in a conversation they have to reconstruct.

Check four gates before you approve

A PR is approvable when these four questions have written answers in the PR body. If any one is blank, the receipt is missing and the review is guessing.

Gate Question
Replay proof Which commands prove regression guards?
Receipt match Does the PR body list scopes plus a verification transcript?
Rules precedence Which .mdc, SKILL.md, or CLAUDE.md governed behavior?
Connector truth Which MCP servers fired, and were they expected?

A quick checklist to paste into the PR template:

  • Primary-doc links were smoke-checked after publishing edits.
  • MCP connectors mentioned (if any) list owners.
  • Verification command output is pasted or linked.
  • Forked agent work lists parent and child responsibilities.

These receipts are part of the wider agentic coding governance playbook, and they matter twice over once agents gain browser control, the case we work through in browser control guardrails for AI coding agents.

Keep ownership with people

None of this hands architecture judgment to the agent. Agents speed up execution; ownership stays with the team. Tooling is load-bearing language, so if the repo cannot say "allowed" and "forbidden," neither can the agent.

You will know it is working when standups stop being archaeology. The conversation goes back to design instead of digging through old diffs to figure out who decided what.

Docs to keep open

Common questions

  • What do agentic workflows from PR to merge need before approval?

    The PR body needs scopes that match the folders in the diff, plus a verification transcript a reviewer can replay. The decision stub adds three forced lines: constraints considered, alternatives rejected, and verification proof. With those in place, the review checks evidence instead of arguing about taste, and nobody is reconstructing intent from a chat log.

  • How does the replay sandwich fix Codex review gaps?

    The replay sandwich makes AGENTS.md require an intent line, the command transcript, and a diff summary before the PR opens. Review becomes reproducible without standing behind someone's terminal. That is what turns a merged green build into work the team actually owns, because the reviewer saw the same run the agent did.

  • What is a connector card for MCP servers?

    A connector card is one markdown card per MCP server that lists allowed actions, forbidden actions, owner, and rollback. Connectors ship as capability demos, so least privilege needs explicit trust boundaries written down somewhere. Once the cards exist, incidents shrink, because the operator already knows what turning the connector off should look like.

  • Why do chained agents blur ownership before merge?

    Chained agents blur ownership when summaries replace receipts. The child receipt block fixes it: every child returns the paths it touched, the commands it ran, and the tests proving the regression guards hold. Parents stop green-lighting mystery diffs. The duplicated edit that nobody reconciled is the expensive bug this prevents.

Where to go next

We rehearse this PR-to-merge drill with teams on their own repos, receipts enforced from the first run. The format is on the training page.

Related training topics

Related research

Continue through the research archive

Ready to start?

Transform how your team builds software.

Get in touch