Contents

Codex Best Practices

Codex is no longer just a command-line coding tool. OpenAI now presents it as an AI coding agent that can write, review, edit, and run code across the Codex CLI, IDE extension, Codex app, Codex web/cloud, and GitHub review workflows.

That changes what “best practices” means. It is not only about writing better prompts. It is about building an engineering workflow around an agent:

  • how to scope tasks
  • how to provide context
  • how to encode repository rules
  • how to configure environments
  • how to manage permissions and network access
  • how to validate results
  • how to automate repeatable work

Codex is most effective when you treat it like a configurable engineering teammate whose work still needs review, verification, and calibration.

Pick The Right Surface

Different Codex surfaces fit different jobs.

CLI: Local Pairing And Fast Changes

Codex CLI is best when you want tight local iteration inside the current repository: read code, edit files, run tests, and inspect diffs.

Good fits:

  • explaining an unfamiliar module
  • fixing a well-defined bug
  • doing a narrow refactor
  • adding tests around existing behavior
  • reviewing uncommitted local changes

The CLI is fast and direct, but you are responsible for context, permissions, and the local environment.

IDE Extension: Work Near The Code

The IDE extension is useful when the task starts from files, selections, diagnostics, screenshots, or errors already visible in your editor.

If the problem is “this code I am looking at,” the IDE extension is usually more natural than restating everything in a terminal prompt.

Codex App: Multi-Project Agent Work

OpenAI’s Help Center describes the Codex app as a way to run multiple agents in parallel across projects, with built-in support for worktrees, skills, automations, and git workflows.

Good fits:

  • parallel exploration across repositories
  • implementation work isolated in worktrees
  • reusable skill-based workflows
  • recurring background tasks
  • longer-running development threads

Codex Cloud: Delegate Background Work

Codex cloud creates an isolated cloud container for each task, checks out the selected repository state, runs setup, applies network settings, executes commands in a loop, edits code, validates work, and returns a diff.

Good fits:

  • clearly written issues
  • parallel solution exploration
  • fixes that should not block your editor session
  • background tasks that can become pull requests
  • changes that can be validated by tests and diff review

If a task still needs constant steering, clarify it locally first.

Write Tasks Like GitHub Issues

OpenAI’s own guidance is to structure Codex prompts like GitHub issues rather than vague wishes.

A good prompt usually includes four things:

  • Goal: what should change and why
  • Context: relevant files, directories, errors, screenshots, docs, or reference implementations
  • Constraints: what must not change and which patterns to follow
  • Done when: tests, lint, screenshots, diff expectations, or acceptance criteria

Weak:

optimize the checkout flow

Better:

In src/checkout/, reduce duplicate payment validation logic. Keep public API response shapes unchanged. Follow the pattern in src/orders/validation.ts. Add or update unit tests, then run the checkout test suite and summarize the diff.

This style does three practical things:

  1. narrows the change surface
  2. reduces guessing
  3. makes the result easier to review

Plan Before Coding Hard Tasks

For complex work, use this sequence:

  1. Ask Codex to read the code.
  2. Ask it to explain the current behavior.
  3. Ask it to propose a plan.
  4. Confirm the scope.
  5. Ask it to implement.
  6. Run validation.
  7. Review the diff.

Many bad outcomes happen because the first two steps are skipped. The agent starts applying a plausible solution before it understands the actual system.

Tasks that deserve planning:

  • cross-module refactors
  • data model migrations
  • authentication, billing, permissions, or audit changes
  • test strategy changes
  • CI/CD and deployment scripts
  • requirements you have not fully clarified yet

Start with something like:

Read the current authentication flow first. Do not edit files yet.
Explain the request path, list the files likely affected by token refresh,
then propose a minimal implementation plan with validation commands.

Then implement only after the plan looks right:

Implement the approved plan. Keep the public session format unchanged.
Run the auth unit tests and summarize any test gaps before finishing.

Encode Reusable Guidance In AGENTS.md

OpenAI’s Codex docs describe AGENTS.md as the project instruction file for agents. Codex reads these files before it starts work and merges global, project, and subdirectory guidance, with more specific files closer to the working directory taking precedence.

A useful AGENTS.md is not a full engineering handbook. It should contain information Codex cannot reliably infer from the repository but must follow.

Include:

  • repository layout and directory responsibilities
  • local setup, test, lint, and formatting commands
  • PR or pre-commit expectations
  • boundaries that must not be crossed
  • team-specific conventions
  • reference implementations to copy
  • known traps and constraints

Example:

# Validation
- Run `pnpm test --filter checkout` after checkout changes.
- Run `pnpm lint` before finishing TypeScript edits.
- For database schema edits, run `pnpm prisma generate`.

# Constraints
- Do not change public API response shapes without explicit approval.
- Do not add production dependencies unless requested.
- Do not read or write secret files; only reference env var names.

# Patterns
- For validation helpers, follow `src/orders/validation.ts`.
- For background jobs, follow `src/jobs/send-digest.ts`.

Avoid two failure modes:

  • generic rules like “write clean code”
  • dumping the entire architecture history into the file

A practical rule: when Codex makes the same mistake twice, update AGENTS.md with the corrected rule.

Environment Quality Sets The Ceiling

Many Codex failures are environment failures.

In Codex cloud, a task roughly follows this flow:

  1. create a container
  2. check out the repository
  3. run the setup script
  4. apply network settings
  5. let the agent run commands, edit code, and validate work
  6. return an answer and diff

If dependencies do not install, validation commands are missing, environment variables are wrong, or setup is unstable, Codex starts guessing.

Configure these early:

  • runtime versions for Node, Python, Go, or other stacks
  • package manager and install commands
  • linters, formatters, and type checkers
  • validation commands in AGENTS.md
  • required environment variables in the cloud environment
  • idempotent setup scripts

Be precise about secrets. Codex cloud docs distinguish secrets from regular environment variables: secrets are available to setup scripts, then removed before the agent phase.

Write secret-related tasks like this:

Ensure the code reads the OPENAI_API_KEY environment variable. Do not display or write the secret value.

Do not paste real secrets into prompts or repository files.

Keep Permissions And Network Tight By Default

Codex safety controls are mainly two knobs:

  • sandbox mode: what the agent can technically read, write, or access
  • approval policy: when it must stop and ask before acting

OpenAI’s docs emphasize that agent network access is off by default. Locally, Codex uses OS-level sandboxing to limit write access. In the cloud, it runs inside isolated OpenAI-managed containers.

For low-risk work, looser automation may be reasonable:

  • formatting
  • unit tests
  • read-only exploration
  • documentation edits
  • small renames
  • local lint fixes

For higher-risk work, keep tighter control:

  • installing dependencies
  • changing deployment scripts
  • accessing the public internet
  • touching secrets
  • data migrations
  • bulk file deletion or movement
  • authentication, billing, permissions, or audit changes

Network access deserves special caution. Codex cloud keeps agent internet access off by default; if you enable it, allow only necessary domains and HTTP methods, then review the work log.

The reason is straightforward: when an agent reads untrusted web pages, issues, READMEs, or scripts, it can encounter prompt injection. Network access also expands the risk of code or secret exfiltration, malicious dependencies, and license contamination.

Validation Beats Generation Speed

Do not treat Codex’s final message as acceptance. Real acceptance comes from:

  • passing tests
  • passing lint or type checks
  • reproduced or fixed behavior
  • a diff that stays inside scope
  • screenshots or browser checks for UI work
  • review of bugs, regressions, and test gaps

Put acceptance criteria directly in the prompt:

Done when:
- `pnpm test --filter auth` passes
- no public API response shape changes
- the final answer lists changed files and any remaining risks

If there are no tests, ask Codex to add the smallest useful tests or write manual verification steps.

For pull requests, Codex /review and GitHub review workflows are useful as a second pass. They do not replace human review, but they can catch obvious bugs, regressions, scope creep, and missing tests before a human reviewer spends attention.

A solid loop:

  1. Codex implements.
  2. Codex runs validation.
  3. Codex reviews its diff.
  4. A human reviews high-risk areas.
  5. CI provides the final gate.

Use MCP Only For Context That Matters

Codex can use MCP to connect to external systems. MCP is useful when the context Codex needs does not live in the repository:

  • GitHub issues, pull requests, and CI status
  • Linear or Jira requirements
  • Slack discussions
  • database schemas
  • internal docs
  • design or operations systems

Do not connect every tool on day one. Each tool adds permission surface, failure modes, and context noise.

Before adding an MCP server, ask:

  • Does this context change often?
  • Is manual copy-paste expensive or error-prone?
  • Does Codex need to call a tool, not just read a note?
  • Will this integration be used repeatedly?

If the answer is unclear, wait. Start with one high-value integration and expand only after the workflow proves itself.

Skills Capture Method, Automations Set Cadence

OpenAI’s current Codex best-practices docs include Skills and Automations as first-class workflow tools.

The division is useful:

  • AGENTS.md: repository rules
  • MCP: external context and tools
  • Skills: reusable methods
  • Automations: scheduled execution of stable tasks

If you keep asking Codex to use the same review checklist, make it a skill. A skill can include SKILL.md, supporting scripts, references, and clear inputs and outputs.

Good skill candidates:

  • PR review checklist
  • release note drafting
  • incident summaries
  • log triage
  • migration planning
  • frontend visual QA
  • dependency upgrade checks

Once a skill is stable, it may become an automation:

  • scan CI failures daily
  • summarize recent commits weekly
  • check dependency updates on a schedule
  • produce standup summaries
  • review recent Codex sessions and update recurring guidance

In short: Skills define how the work is done; Automations define when it runs.

Use Worktrees And Parallelism Carefully

Codex cloud and app workflows make parallel work natural: one agent fixes a bug, another writes tests, another reviews the diff, and another drafts a migration plan.

Parallelism is useful only when write scopes do not collide.

Good parallel splits:

  • backend change plus docs update
  • one agent writes tests while another investigates root cause
  • several agents propose alternative plans
  • one agent implements while another only reviews

Poor parallel splits:

  • several agents refactor the same files
  • multiple implementations before requirements are clear
  • schema and caller changes without coordination
  • several agents depending on an unstable shared abstraction

If you use Best-of-N or multiple agents, require each result to explain:

  • key assumptions
  • change scope
  • risks
  • validation plan
  • why this approach is better than alternatives

Do not choose the biggest diff or the most confident answer by default.

Follow A Goal For Long-Running Work

/goal is an experimental Codex CLI feature that lets Codex keep working toward one durable objective across multiple turns until it reaches a verifiable stopping condition, rather than finishing after a single exchange. When you use /goal, Codex can work independently for multiple hours without input.

Enable it from /experimental, or add goals = true under [features] in config.toml. Then set a goal with /goal <objective>, inspect status with /goal, and control the run with /goal pause, /goal resume, or /goal clear.

A good goal is bigger than one prompt but smaller than an open-ended backlog. The key idea is the contract: Codex must know what “done” means before it starts. If the goal is a migration, “done” might mean the new path passes contract tests and the legacy path still has a rollback. If the goal is a prototype, “done” might mean the app builds, launches, and matches the expected behavior.

When To Use /goal

Good fits:

  • Code migration where the target stack, parity checks, and constraints are clearly defined.
  • Large refactors where Codex can run tests after each checkpoint.
  • Experiments, games, or prototypes where Codex can keep improving a working artifact.
  • Prompt optimization against an eval suite — Codex can inspect failures, update the prompt, rerun evals, and iterate until scores improve or the stopping condition is reached.

Avoid /goal for a loose list of unrelated tasks.

Set Up The Loop

  1. Name one objective and one stopping condition.
  2. Point Codex at the files, docs, issue, logs, or plan it must read first.
  3. Define the commands or artifacts that prove progress.
  4. Tell Codex to work in checkpoints and keep a short progress log.
  5. Use /goal to inspect status while it runs.
  6. Pause, resume, or clear when the run is done, blocked, or changing direction.

A useful start: have a short conversation about what you want to build, then ask Codex to set the goal itself and start working.

Let Codex Work Independently

During a goal run, ask for compact progress reports. A useful status update names the current checkpoint, what was verified, what remains, and whether Codex is blocked. If status becomes vague, tighten the goal rather than adding ad hoc instructions — tell Codex which checkpoint matters next, which command proves it, and what should cause it to pause.

Codex will stop when it is reasonably confident the stopping condition has been met. Think of /goal as a background task you do not need to monitor, not an interactive conversation.

Example Goals

Migration — moving a codebase to a new framework, a mobile app to a new platform, or games to a new stack:

/goal Migrate the payment module from stripe-v2 to stripe-v3.
Read the migration guide at docs/stripe-v3-migration.md first.
Work in checkpoints: one endpoint at a time. After each endpoint,
run `pnpm test --filter payment` and log the result.
Stop when all payment tests pass and no v2 imports remain.
Do not change the public API response shapes.

Prototype creation — building a new app, game, or feature to a polished first version:

/goal Build the CLI tool described in PLAN.md.
Read PLAN.md first for scope and constraints.
Work in checkpoints matching the phases in PLAN.md.
After each phase, verify the tool builds and runs a smoke test.
Stop when all phases are complete and the final smoke test passes.

Prompt optimization — iterating against an eval suite until the target score is reached:

/goal Improve the summarizer prompt in prompts/summarize.txt.
Run `python evals/run.py` to measure accuracy.
Inspect failures, update the prompt, rerun the evals.
Stop when accuracy reaches 85% or 20 iterations have run.
Log the score and prompt version after each iteration.

Understanding Codex Credits

Codex uses token-based pricing where you pay in credits per million tokens processed. Rates differ by model and token type. The table below reflects the rate card as of this writing — check the official Codex Rate Card for the latest figures.

Model Input (per 1M tokens) Cached Input Output (per 1M tokens)
GPT-5.5 125 credits 12.5 credits 750 credits
GPT-5.4 62.5 credits 6.25 credits 375 credits
GPT-5.4 mini 18.75 credits 1.875 credits 113 credits
GPT-5.3-Codex 43.75 credits 4.375 credits 350 credits
GPT-5.2 43.75 credits 4.375 credits 350 credits

1 credit ≈ $0.01. Code review uses GPT-5.3-Codex by default (check the rate card for current defaults). OpenAI reports typical usage falls between $100 and $200 per developer per month, with significant variance depending on usage patterns.

You can monitor credit consumption in the Usage panel inside Codex settings.

A few things worth knowing:

  • Cached input is much cheaper: when Codex re-reads the same repository content across turns, caching kicks in automatically and reduces cost significantly. Long-context, long-running goals benefit most from this.
  • Match model to task importance: use higher-quality models for critical or high-stakes tasks; use smaller models for exploration, drafting, or repetitive work.
  • Tight stopping conditions save credits: a goal without a clear endpoint can run a long time without producing useful output. Good acceptance criteria let Codex stop sooner.

As a rule: credit budget is part of task design — clear stopping conditions are not just a quality practice, they are cost control.

Give Codex A Clear Review Role

Codex code review is now a major official workflow. OpenAI’s upgrade post says Codex can review PR intent against the diff, reason over the codebase and dependencies, run code and tests, and post reviews in GitHub automatically or when mentioned with @codex review.

The review prompt still matters.

Weak:

review this PR

Better:

Review this PR for correctness, security, and regression risk. Focus on changed behavior, missing tests, data migrations, and public API compatibility. Do not comment on style unless it hides a bug.

If your team has review standards, put them in code_review.md and reference that file from AGENTS.md.

Use Codex review for:

  • first-pass automated screening
  • supplementing human attention
  • catching repeated mistakes and obvious regressions
  • checking whether acceptance criteria were actually met

Do not treat it as:

  • the final architecture decision
  • a security sign-off
  • the final business semantics check
  • a replacement for CI or human review

A Repeatable Codex Workflow

For real projects, this is a good default:

  1. Prepare rules: keep a root AGENTS.md with directories, commands, constraints, and acceptance criteria.
  2. Describe the task: include goal, context, constraints, and done conditions.
  3. Plan first: for complex tasks, ask Codex to read the repo and propose a plan.
  4. Constrain scope: name writable files, forbidden behavior changes, and validation commands.
  5. Implement in small steps: avoid one giant diff.
  6. Validate: run tests, lint, type checks, screenshots, or manual verification.
  7. Review again: ask Codex to review the diff, then have a human inspect high-risk areas.
  8. Capture learning: move repeated corrections into AGENTS.md, skills, or automations.

Reusable prompt template:

Goal:
<what to implement or fix>

Context:
- Relevant files: <files/directories>
- Reference pattern: <existing implementation>
- Error/output: <error or observed behavior>

Constraints:
- Do not change <boundary>
- Follow <repo pattern>
- Ask before <high-risk action>

Done when:
- <test command> passes
- <behavior acceptance> is true
- Final response includes changed files, validation results, and remaining risks

Common Anti-Patterns

Anti-Pattern 1: One Giant Request

rebuild the billing system

Ask Codex to map the current system, risks, and a staged plan first. Then execute one subtask at a time.

Anti-Pattern 2: No Validation Command

Without validation, Codex can only reason statically. Static reasoning is useful, but it is not a substitute for running the system.

Anti-Pattern 3: Pasting Secrets

Only reference environment variable names. Never paste real secret values into prompts or repository files.

Anti-Pattern 4: Vague AGENTS.md

“Write good code” does not help much. Concrete commands, boundaries, and reference files do.

Anti-Pattern 5: Blind Network Access

Internet access can solve dependency and freshness problems, but it also brings prompt injection, exfiltration, and supply-chain risk. Keep it off by default and allow only what is necessary.

Anti-Pattern 6: Treating Review As A Formality

After Codex edits code, inspect the diff, run checks, and ask it to explain remaining risks. Otherwise you are just moving time from coding into debugging.

Conclusion

Codex’s core capability is not “automatic code generation.” It is an agent loop that can read, plan, edit, test, review, and repeat.

The best practices are engineering discipline:

  • keep tasks small and clear
  • provide locatable context
  • encode durable guidance
  • make environments reproducible
  • minimize permissions
  • validate outputs
  • turn repeatable workflows into tools

Do that, and Codex starts to behave less like a code generator and more like a reliable engineering teammate.

References