Back to Research

Codex GPT-5.5 and browser checks

GPT-5.5, in-app browser use, and auto-reviewed approvals in Codex. What changed and how to tighten the loop.

Editorial illustration for Codex GPT-5.5 and browser checks. GPT-5.5, in-app browser use, and auto-reviewed approvals in Codex. What changed and how to tighten
Rogier MullerMay 2, 20265 min read

The situation

OpenAI’s Codex changelog adds three changes that matter for real codebase loops: GPT-5.5 is available in Codex, the app can use its in-app browser for local development and file-backed pages, and some approval prompts can go through an automatic reviewer before execution.

For Codex users, the practical question is what to change in the next repo session. Usually it is not a full rewrite. It is tighter instructions, explicit verification, and a clearer line on which steps need human review. That maps to Codex CLI workflows and to the repo files that shape them, especially AGENTS.md.

This is for teams using Codex for implementation, refactors, debugging, testing, and reviewable diffs. For the broader workflow framing, keep the related training topic open.

What changed

  1. Start with the model switch, but treat it as a workflow choice. The changelog says GPT-5.5 is the recommended choice for most Codex tasks when it appears in the model picker, especially for implementation, refactors, debugging, testing, validation, and knowledge-work artifacts. In the CLI, that means starting a new thread with codex --model gpt-5.5 or switching with /model during a session. In the IDE extension and Codex app, the same choice is in the composer model selector.

  2. Update repo instructions before asking for larger changes. Codex reads AGENTS.md, so that file is the place for architecture rules, verification expectations, and local conventions. Keep it short and operational.

# AGENTS.md

- Keep changes scoped to the smallest reviewable diff.
- If you touch business logic, add or update tests in the same change.
- Run the project verification command before asking for review.
- Prefer existing patterns over new abstractions unless the task requires a refactor.
  1. Use GPT-5.5 where the loop benefits from stronger synthesis. The changelog positions it for implementation, refactors, debugging, testing, validation, and knowledge-work artifacts. A practical split is to use it for multi-step repo work, then keep the final check on deterministic verification. Ask Codex to implement, ask it to explain the diff, then run the repo’s own tests or checks.

  2. Use the browser when the bug is visual or interaction-based. The Codex app can operate the in-app browser for local development servers and file-backed pages. That helps when a fix needs a click path, a rendered state, or a visual confirmation that a local server behaves as expected. Browser settings also let teams review allowed and blocked websites. The habit to build is simple: browser for reproduction and confirmation, CLI for code changes and repeatable checks.

  3. Treat automatic approval review as a gate. The changelog says eligible approval prompts can be routed through an automatic reviewer agent before the request runs, and the app shows the review status and risk level. Teams still need a policy for what the reviewer may approve, what must stop for human review, and what should never run without explicit sign-off.

  4. Keep the loop reviewable. A good Codex session should end with a diff a teammate can inspect, a test result they can reproduce, and a note about any browser-based confirmation or approval decision. If the task touched external systems, add an MCP boundary note: what connector was used, what scope it had, and what data it could reach.

What teams should try

A compact starter checklist helps teams adopt the update without overfitting to the model release:

  • Confirm GPT-5.5 is available in your Codex client; otherwise stay on GPT-5.4 during rollout.
  • Add or tighten AGENTS.md instructions for scope, tests, and review.
  • Use browser mode only for tasks that need rendered verification.
  • Record whether approval prompts were auto-reviewed, denied, stopped, or timed out.
  • Require a human check on the final diff before merge.

A small methodology note: this is a Test problem as much as a model problem. The useful change is a tighter verification loop that makes the result easier to trust. That is the kind of step we try to keep explicit in our methodology.

Tradeoffs and limits

GPT-5.5 may be the recommended choice for many tasks, but rollout timing still matters. If it is not visible in your client yet, the changelog says to update the CLI, IDE extension, or Codex app, and to keep using GPT-5.4 during rollout. Teams should not block work on the new model being present everywhere on day one.

Browser use is helpful, but it also widens what Codex can touch. That makes website allow/block settings and local-server discipline more important. If a task does not need rendered verification, keep it in the CLI loop.

Automatic approval review can reduce friction, but it can also create false confidence if teams stop reading the risk signal. Use it to triage, not to waive review culture. For production repos, the safer pattern is still: instruction file, scoped change, deterministic test, human review.

Further reading

Related training topics

Related research

Ready to start?

Transform how your team builds software.

Get in touch