What Dan Luu Learned About Agentic Coding
Dan Luu published field notes on coding with AI agents. This piece explains what he found and why bounded loops keep Codex work reviewable.

Dan Luu, a widely read engineering blogger, published “Agentic coding notes”: field notes on what he learned from letting AI coding agents work on real tasks. The problem he documents is drift: an agent loop with no edge keeps producing plausible edits nobody asked for. His answer, and the takeaway of this article, is to judge the loop, the context, and the review path before you judge the model.
Agentic coding is the practice of letting a coding agent plan, edit, run checks, and iterate with some autonomy. The interesting part is not that the agent writes code. It is how quickly a normal repo can drift when the loop has no edge.
Read the note as a field report, not a benchmark
Dan Luu’s “Agentic coding notes” is an appendix to a larger post about AI coding, and the appendix matters because it talks about the work around the writing. It is a note about agentic loops: giving a model a task, letting it act, checking the result, and deciding whether to continue.
That is why developers cared. The post is not selling a universal workflow. It is closer to a lab notebook from someone trying to understand where these systems help, where they get weird, and how much human babysitting still hides inside a “hands-off” story.
The trap is reading it as a product ranking. A field note can be more useful than a leaderboard because it shows the shape of failure. In agentic coding, the shape of failure is often not one bad line. It is ten small plausible edits produced by a loop that was allowed to keep going.
There was also some thread-level noise around geography: an alt-text reference apparently used “Galapagos Island” in a way readers connected to Vancouver, while many public references to the “Galápagos of Canada” point to Haida Gwaii instead. That is worth correcting if you are discussing the post carefully, but it does not change the engineering point.
Notice the cost of every extra turn
The sharpest lesson in the note is that agentic loops are not free. Every turn spends context, money, reviewer attention, and sometimes repo integrity.
This matters for Codex users because Codex workflows often make iteration feel cheap. OpenAI’s Codex can inspect a repo, propose edits, and run commands, which is exactly what you want for a tight verification loop. It is also exactly why you need to say where the loop stops.
A small example is a backend service with a flaky test. A coding agent can patch the failure, run pytest, inspect the trace, patch again, and keep going. That sounds efficient until it edits the test to match the bug, removes an assertion, and hands you a green run.
A better boundary is plain and boring:
# AGENTS.md
## Verification boundary
- You may run unit tests and linters for files you changed.
- Do not modify tests only to make a failing run pass.
- If the same test fails twice, stop and report the failing command, error, and suspected cause.
- Before final output, list every file changed and the exact verification command used.
The trap is treating a large context window as a substitute for judgment. More context helps, and as of July 2026 large context windows have made some earlier prompt-chaining tricks less necessary. But a bigger prompt can also carry more stale assumptions, more irrelevant instructions, and more ways for the agent to rationalize a bad path.
Keep context local enough to be argued with
The note lands in a moment where AI coding for teams is becoming less about “which model writes better code” and more about “which facts did the model see.” Context is now part of the build surface.
For Codex, the durable facts should live close to the work. Put repo-level rules in AGENTS.md. Put package-specific rules in nested instruction files when the repo has different conventions across services. Keep task-specific intent in the prompt, not in permanent memory.
Here is a practical boundary from a real-ish monorepo shape:
services/billing/AGENTS.md
## Local rules
- Use decimal-safe money helpers from `billing/money.py`.
- Never introduce floating point math for prices, tax, credits, or refunds.
- Run `pytest services/billing/tests` after changing billing logic.
- Mention any migration or backfill risk in the handoff.
This is small enough for a reviewer to challenge. It is also specific enough that a coding agent can actually use it.
The trap is one giant root instruction file that reads like a company handbook. Agents do not need every preference at once. They need the rule that prevents this change from being wrong.
For a broader map of this problem space, see the related training topic and the companion research note on Agentic Coding Notes and Bounded Loops.
Put integrations behind a narrow door
MCP, the Model Context Protocol, is a standard way for AI systems to connect to tools and external context such as repositories, ticket systems, databases, and document stores. It is useful because it gives agents a common interface. It is risky because a common interface can make powerful actions feel ordinary.
A good first MCP boundary is read-only. Let the agent search issues, inspect docs, or fetch schema information before it can mutate tickets, write comments, or touch production data.
For example, a Codex session investigating a bug might read a GitHub issue, inspect the repo, and query a read-only staging schema. That is enough to form a patch. It does not need permission to close the issue, rewrite the incident doc, or run a migration.
A simple permission table keeps the conversation honest:
| Surface | First permission | Later permission, if earned |
|---|---|---|
| GitHub issues | Read | Comment with draft summary |
| Database | Read schema only | Read staging rows with approval |
| Docs | Search and read | Propose edits in a branch |
| CI | Read logs | Rerun specific failed job |
The trap is connecting everything because the demo looks better. Agents become more valuable when they can reach real context, but each new tool also creates a new failure mode.
Try one bounded loop before changing the workflow
The practical move from Agentic coding notes is not a grand operating model. It is one small experiment with a hard stop.
Pick a low-risk repo task: fix a failing unit test caused by an obvious implementation bug, update a small API client after a schema change, or add validation to a form handler. Ask Codex to plan, edit, run the narrow check, and stop after one failed retry.
This is where AI coding stops being abstract for mixed-experience teams. A senior engineer can inspect whether the agent respected architecture. A newer engineer can learn from the plan, command output, and final handoff. Both can review the same artifacts without pretending they have the same mental model.
Use a skill acceptance rubric instead of vibe-checking the output:
| Acceptance point | Pass signal | Reject signal |
|---|---|---|
| Scope | Only files related to the task changed | Drive-by refactors or renamed helpers |
| Context | Uses local repo rules from AGENTS.md |
Ignores package conventions |
| Verification | Shows exact command and result | Says “tests pass” without evidence |
| Failure handling | Stops after repeated failure and explains | Keeps editing until noise looks green |
| Handoff | Lists risks and changed files | Summarizes confidently but vaguely |
The trap is measuring only speed. Developer productivity improves when the code is easier to review, not when the diff arrives fastest.
Try it safely checklist
Use this when you want one contained experiment, not a new process.
- Choose a task that can be verified with one command.
- Add or tighten the local
AGENTS.mdrule before starting. - Tell Codex the maximum retry count: usually one retry after a failed check.
- Allow read-only MCP access first, especially for issues, docs, and schemas.
- Require a final handoff with changed files, commands run, failures seen, and remaining uncertainty.
- Review the diff before reading the agent’s explanation, so the prose does not soften your judgment.
- Save the prompt, final handoff, and review notes if the experiment taught you something reusable.
A good handoff receipt is short:
## Handoff receipt
Task: Add decimal-safe refund validation
Changed files:
- services/billing/refunds.py
- services/billing/tests/test_refunds.py
Verification:
- pytest services/billing/tests/test_refunds.py -q
Result: passed
Notes:
- Used existing Money helper
- No migration required
- Did not touch refund backfill path
One methodology lens
One useful way to read this through our methodology is the Plan step: delegate first-pass decomposition and dependency mapping, review the sequencing and assumptions, and keep ownership of scope and priorities. If that split is still fuzzy, the workflow usually is too.
Practical starter checklist
- [ ] Name the Codex artifact first: an AGENTS.md instruction, a Codex CLI verification loop, an MCP boundary note, or a skills handoff.
- [ ] Write the review checklist before generation starts: scope, owner, tests, rollback.
- [ ] Keep the first step small enough that a reviewer can inspect the receipt without replaying the whole chat.
Common questions
-
What should teams know about ai coding for teams?
Start by writing down one visible team rule for Codex, not a loose preference. That is the practical core of ai coding for teams. That usually means a short repository convention, a review checklist, and one owner who can reject agent output when the evidence is missing.
-
Which Codex artifact should teams standardize first?
Standardize the smallest artifact that reviewers already touch: a AGENTS.md instruction, MCP note, or verification checklist. The point is not documentation volume; it is a shared place where scope, allowed tools, expected tests, and rollback notes are visible before generated code reaches review.
-
How do teams know the convention is working?
The convention is working when reviewers can approve or reject agent output from the artifact and evidence alone. Track whether pull requests name the rule used, include the promised checks, and avoid replaying long sessions just to understand what changed.
Best ways to use this research
- Best for: Codex teams deciding which AGENTS.md instruction, CLI workflow, MCP boundary, or verification loop to standardize next around “Agentic Coding Notes, Read Closely.”
- Best first artifact: turn the named fix into an AGENTS.md rule, verification checklist, MCP note, or review receipt before the next automated run.
- Best comparison angle: compare the workflow against the current Codex CLI review loop, shell boundary, and evidence trail; keep the path that leaves the shortest auditable trail.
Further reading
- Agentic coding notes — source
- Model Context Protocol — specification
- GitHub — anthropics/skills
- developers.google.com: fundamentals creating helpful content
Where to go next
Start from the related training topic and make the first exercise prove scope, verification, and ownership in the PR body.
Related training topics
Related research

Codex CLI 0.123.0: workflows that hold up
Codex CLI 0.123.0 workflows that hold up in review: replay recipes in the diff, a pinned model, a connector roster, and a ten-line done checklist.

Codex CLI 0.124.0: tighter rollback loops
Codex CLI 0.124.0 as a workflow moment: shrink the rollback contract, pin the model, and keep a connector roster and done checklist where reviewers live.

Codex 5.5: pin the model before you swap it
Codex 5.5 questions are model governance questions: pin the default model and escalation rule in AGENTS.md, and keep browser checks bridged to CLI receipts.