GPT-5.5 Best Practices
What Is GPT-5.5
GPT-5.5 raises the baseline for complex production workflows. It is suited for: coding scenarios, heavy tool-calling agents, grounded assistants, long-context retrieval, product-spec-to-plan workflows, and customer-service scenarios that demand execution quality and polished responses.
Core principle: treat it as a new model family that needs re-tuning — not a drop-in replacement for GPT-4 or GPT-5.2. Start from the minimum viable prompt, then tune reasoning effort, verbosity, tool descriptions, and output format based on representative samples.
What’s New
More Efficient Reasoning
GPT-5.5 achieves the same results as prior models using fewer reasoning tokens, even at identical reasoning effort level. This advantage compounds in complex, heavy tool-calling, multi-step workflows.
Better Outcome-First Execution
GPT-5.5 is better at starting from a clear goal: maintaining constraints, translating product intent into concrete next steps. Describe the expected outcome, success criteria, allowed side effects, evidence rules, and output shape. Do not use step-by-step process instructions to constrain it — unless the path itself is a hard product requirement. This is exactly the outcome-first prompt pattern OpenAI recommends.
More Precise Tool Selection
GPT-5.5 is especially useful on large tool surfaces, multi-step service flows, and long-running agent tasks. Tool selection and parameter usage are both more precise, especially when your prompt includes explicit tool-calling rules.
More Polished but More Direct
GPT-5.5 typically produces warmer, more readable answers with less prompt scaffolding. But it is also more direct — beneficial for many production workflows, but requiring explicit personality definition for customer-facing or conversational products.
Behavioral Changes
Reasoning Effort Default Is Now Medium
GPT-5.5’s reasoning_effort default is medium. Each level’s meaning (see OpenAI’s latest-model guide for the full parameter reference):
- none: Latency-sensitive scenarios that don’t need reasoning or multi-step tool chains — lightweight voice turns, quick information retrieval, classification
- low: Latency-sensitive but intelligence still matters — start evaluation here
- medium: Best balance of quality, reliability, latency, and cost
- high: Complex agent tasks requiring hard reasoning; latency is secondary
- xhigh: Hardest async agent tasks, or evals testing the model’s intelligence ceiling
Higher reasoning effort isn’t automatically better. If the task has conflicting instructions, weak stop conditions, or open-ended tool access, higher effort leads to overthinking, unnecessary searching, or degraded output quality. Upgrade only when evals prove quality improvement.
Image Inputs Preserve More Visual Detail
GPT-5.5 updated the default handling of image inputs to preserve more visual detail:
- auto (default): No upscaling, max 10,240,000 pixels or 6,000px dimension limit
- high: No upscaling, max 2,500,000 pixels or 2,048px dimension limit
- low: Prioritizes context efficiency, more aggressively downsamples above 512px
- original: Preserves original size, no scaling — suitable for tasks requiring visual precision, especially computer use, localization, OCR, and tasks requiring click accuracy
If your workflow relies on visual precision, explicitly specify the image_detail level in the prompt or integration layer rather than relying on auto. The model-side defaults are documented in the latest-model guide.
Default Style Is More Concise and Direct
GPT-5.5’s default style is efficient, direct, and task-oriented. Useful for production systems, but customer-service or conversational products need explicit personality, warmth, rationale, and format definition.
Use text.verbosity to control this: medium is the default, low is a good starting point for more concise replies (see OpenAI’s prompt-guidance guide for the full verbosity parameter reference).
Coding Workflows Need Stronger Orchestration
GPT-5.5 is better suited for complex coding tasks requiring planning, tool calls, codebase navigation, verification, and multi-step execution. For coding agents, be explicit about: reuse strategy, sub-agent delegation, test expectations, acceptance criteria, and when to proceed versus when to ask for help.
GPT-5.4 Strengths and Weaknesses
GPT-5.4 is the current mainstream choice for production-grade assistants and performs especially well in these scenarios:
Strengths
- Personality consistency: Less style drift in long answers
- Agentic robustness: More likely to persist through multi-step work, retries, and agent loops
- Evidence synthesis: Reliable in long-context or multi-tool workflows
- Modular instruction following: Skill-based and block-structured prompts work better with clear output contracts
- Long document analysis: Large documents, dirty data, multi-document inputs
- Parallel tool calling: Maintains accuracy during batch or parallel tool calls
- Excel/financial workflows: Strong instruction following, format fidelity, and self-verification
Weaknesses: These Scenarios Still Need Explicit Prompts
- Low-context tool routing: Tool selection may be inaccurate early in a session when context is still thin
- Dependency-aware workflows: Need explicit prerequisite and downstream step checking
- Reasoning effort selection: Higher effort doesn’t automatically mean better — it depends on task characteristics, not intuition
- Research tasks: Require strict source collection and consistent citations
- High-risk operations: Irreversible or high-impact actions need pre-execution verification
- Coding agents: Tool boundaries must be clear
Start from the minimum viable prompt and only add new instruction blocks after validating with evals.
Writing Outcome-First Prompts
Describe the Destination, Not Every Step Along the Way
This is the most important prompt principle for GPT-5.5.
❌ Don’t do this (process-heavy):
First inspect A, then inspect B, then compare every field, then think through
all possible exceptions, then decide which tool to call, then call the tool,
then explain the entire process to the user.✅ Do this (outcome-first):
Resolve the customer's issue end to end.
Success means:
- the eligibility decision is made from the available policy and account data
- any allowed action is completed before responding
- the final answer includes completed_actions, customer_message, and blockers
- if evidence is missing, ask for the smallest missing fieldTell the model what “a good result looks like” and let it choose an efficient path to get there.
Don’t Copy Old Prompt Stacks During Migration
Many prompts were designed for earlier models: detailed steps, absolute directives (ALWAYS/NEVER/must/only). Earlier models needed these constraints to stay on track. GPT-5.5 does not. Migrating old prompts directly is the most common mistake.
Do not use absolute directives to control model behavior unless the directive describes a true invariant (safety rules, required fields, actions that must never happen). For decisions like “when to search, when to ask, when to use a tool,” prefer decision rules over absolute directives.
Personality and Collaboration Style
For customer service, coaching, and conversational products, explicitly define two dimensions:
- Personality: tone, warmth, directness, formality, humor
- Collaboration style: when to ask, when to assume, when to be proactive, when to check, when to acknowledge uncertainty
Keep both brief. Personality defines the user experience; collaboration style defines task behavior. Neither substitutes for clear goals, success criteria, tool rules, or stop conditions.
Template A: Steady Task-Oriented
# Personality
You are a capable collaborator: approachable, steady, and direct. Assume the user is competent and acting in good faith, and respond with patience, respect, and practical helpfulness.
Prefer making progress over stopping for clarification when the request is already clear enough to attempt. Use context and reasonable assumptions to move forward. Ask for clarification only when the missing information would materially change the answer or create meaningful risk, and keep any question narrow.
Stay concise without becoming curt. Give enough context for the user to understand and trust the answer, then stop. Use examples, comparisons, or simple analogies when they make the point easier to grasp. When correcting the user or disagreeing, be candid but constructive. When an error is pointed out, acknowledge it plainly and focus on fixing it.
Match the user's tone within professional bounds. Avoid emojis and profanity by default, unless the user explicitly asks for that style or has clearly established it as appropriate for the conversation.Template B: Expressive Collaborative
# Personality
Adopt a vivid conversational presence: intelligent, curious, playful when appropriate, and attentive to the user's thinking. Ask good questions when the problem is blurry, then become decisive once there is enough context.
Be warm, collaborative, and polished. Conversation should feel easy and alive, but not chatty for its own sake. Offer a real point of view rather than merely mirroring the user, while staying responsive to their goals and constraints.
Be thoughtful and grounded when the task calls for synthesis or advice. State a clear recommendation when you have enough context, explain important tradeoffs, and name uncertainty without becoming evasive.Streaming Output Preamble
GPT-5.5 may spend time reasoning, planning, or preparing tool calls before visible text appears. For multi-step or tool-intensive tasks, add a preamble before the actual content:
Before any tool calls for a multi-step task, send a short user-visible update that acknowledges the request and states the first step. Keep it to one or two sentences.Coding agents can be more explicit:
You must always start with an intermediary update before any content in the analysis channel if the task will require calling tools. The user update should acknowledge the request and explain your first step.This preamble doesn’t change the task itself but significantly improves perceived responsiveness in streaming scenarios.
Stop Rules
Give the model explicit stop rules:
Resolve the user query in the fewest useful tool loops, but do not let loop minimization outrank correctness, accessible fallback evidence, calculations, or required citation tags for factual claims.
After each result, ask: "Can I answer the user's core request now with useful evidence and citations for the factual claims?" If yes, answer.When evidence is missing:
Use the minimum evidence sufficient to answer correctly, cite it precisely, then stop.Grounding and Citations
For tasks requiring factual citations, citation behavior should be part of the prompt. Define what needs support, what counts as sufficient evidence, and how the model behaves when evidence is absent. Missing evidence does not mean answering “I don’t know.”
citation_rules
- Only cite sources retrieved in the current workflow.
- Never fabricate citations, URLs, IDs, or quote spans.
- Use exactly the citation format required by the host application.
- Attach citations to the specific claims they support, not only at the end.grounding_rules
- Base claims only on provided context or tool outputs.
- If sources conflict, state the conflict explicitly and attribute each side.
- If the context is insufficient or irrelevant, narrow the answer or say you cannot support the claim.
- If a statement is an inference rather than a directly supported fact, label it as an inference.Retrieval Budgets
A retrieval budget is a stop rule for search — telling the model when enough is enough.
Worth retrieving again:
- Top results don’t answer the core question
- Key facts, parameters, dates, IDs, or sources are missing
- The user explicitly asks for exhaustive coverage
- Need to read a specific document, URL, email, meeting record, or code
- The answer will contain important factual claims without source support
Don’t retrieve again:
- Just to improve wording
- To add non-essential details
- To support statements that could safely be written more generally
Guardrails for Creative Tasks
For slides, promotional copy, client summaries, talk tracks, and other creative or generative requests, explicitly distinguish what must be sourced:
For creative or generative requests such as slides, leadership blurbs, outbound copy, summaries for sharing, talk tracks, or narrative framing, distinguish source-backed facts from creative wording.
- Use retrieved or provided facts for concrete product, customer, metric, roadmap, date, capability, and competitive claims, and cite those claims.
- Do not invent specific names, first-party data claims, metrics, roadmap status, customer outcomes, or product capabilities to make the draft sound stronger.
- If there is little or no citable support, write a useful generic draft with placeholders or clearly labeled assumptions rather than unsupported specifics.Frontend Engineering and UI Quality
For frontend workflows, include in the prompt: product context, user context, design system consistency, above-the-fold usability, familiar controls, expected states, responsive behavior, and common generative UI defects to avoid:
- Generic heroes
- Nested cards
- Decorative gradients
- Visible instructional text
- Broken layouts
Having the Model Check Its Own Work
Programming Tasks
After making changes, run the most relevant validation available:
- targeted unit tests for changed behavior
- type checks or lint checks when applicable
- build checks for affected packages
- a minimal smoke test when full validation is too expensive
If validation cannot be run, explain why and describe the next best check.Visual Artifacts
Render the artifact before finalizing. Inspect the rendered output for layout, clipping, spacing, missing content, and visual consistency. Revise until the rendered output matches the requirements.Engineering and Planning Tasks
For implementation plans, include:
- Requirements and the corresponding solution for each
- Resources, files, APIs, or systems involved
- Relevant state transitions or data flows
- Validation commands or checks
- Failure behavior
- Privacy and security considerations
- Open questions that materially affect implementation
GPT-5.4 Tool Use Strategies
Pitfall 1: Inaccurate Tool Routing with Low Context
Early in a session when context is still thin, tool selection may be inaccurate:
tool_persistence_rules
- Use tools whenever they materially improve correctness, completeness, or grounding.
- Do not stop early when another tool call is likely to materially improve correctness or completeness.
- Keep calling tools until: (1) the task is complete, AND (2) verification passes.
- If a tool returns empty or partial results, retry with a different strategy.dependency_checks
- Before taking an action, check whether prerequisite discovery, lookup, or memory retrieval steps are required.
- Do not skip prerequisite steps just because the intended final action seems obvious.
- If the task depends on the output of a prior step, resolve that dependency first.Pitfall 2: Finishing Long Tasks Early
The model may consider the task complete because of batch omissions or empty retrieval results:
completeness_contract
- Treat the task as incomplete until all requested items are covered or explicitly marked [blocked].
- Keep an internal checklist of required deliverables.
- For lists, batches, or paginated results:
- determine expected scope when possible
- track processed items or pages
- confirm coverage before finalizing
- If any item is blocked by missing data, mark it [blocked] and state exactly what is missing.Empty Result Recovery
When retrieval returns empty, partial, or clearly too narrow results:
empty_result_recovery
- Do not immediately conclude that no results exist.
- Try at least one or two fallback strategies:
- alternate query wording
- broader filters
- a prerequisite lookup
- an alternate source or tool
- Only then report that no results were found, along with what you tried.Verification Loop Before High-Risk Actions
After a workflow appears complete, before returning the answer or executing an irreversible action, add a lightweight verification step:
verification_loop
Before finalizing:
- Check correctness: does the output satisfy every requirement?
- Check grounding: are factual claims backed by the provided context or tool outputs?
- Check formatting: does the output match the requested schema or style?
- Check safety and irreversibility: if the next step has external side effects, ask permission first.No Speculation When Context Is Missing
missing_context_gating
- If required context is missing, do NOT guess.
- Prefer the appropriate lookup tool when the missing context is retrievable; ask a minimal clarifying question only when it is not.
- If you must proceed, label assumptions explicitly and choose a reversible action.Agent Execution Safety Framework
For agents that actively execute actions, add an execution framework:
action_safety
- Pre-flight: summarize the intended action and parameters in 1-2 lines.
- Execute via tool.
- Post-flight: confirm the outcome and any validation that was performed.Tool Calling: Concurrent vs. Sequential
parallel_tool_calling
- When multiple retrieval or lookup steps are independent, prefer parallel tool calls to reduce wall-clock time.
- Do not parallelize steps that have prerequisite dependencies or where one result determines the next action.
- After parallel retrieval, pause to synthesize the results before making more calls.
- Prefer selective parallelism: parallelize independent evidence gathering, not speculative or redundant tool use.Core principle: parallelize independent tasks, serialize dependent tasks.
Output Contract and Verbosity Control
GPT-5.4 achieves precise control over output length and structure through the cooperation of output_contract with verbosity parameters (see the prompt-guidance guide for the full reference):
<output_contract>
- Return exactly the sections requested, in the requested order.
- If the prompt defines a preamble, analysis block, or working section, do not treat it as extra output.
- Apply length limits only to the section they are intended for.
- If a format is required (JSON, Markdown, SQL, XML), output only that format.
</output_contract>
<verbosity_controls>
- Prefer concise, information-dense writing.
- Avoid repeating the user's request.
- Keep progress updates brief.
- Do not shorten the answer so aggressively that required evidence, reasoning, or completion checks are omitted.
</verbosity_controls>Follow-Through Strategies
Users frequently change the task, format, or tone mid-conversation. Define clear rules:
default_follow_through_policy
- If the user's intent is clear and the next step is reversible and low-risk, proceed without asking.
- Ask permission only if the next step is:
(a) irreversible,
(b) has external side effects (sending, purchasing, deleting, writing to production), or
(c) requires missing sensitive information or a choice that would materially change the outcome.
- If proceeding, briefly state what you did and what remains optional.instruction_priority
- User instructions override default style, tone, formatting, and initiative preferences.
- Safety, honesty, privacy, and permission constraints do not yield.
- If a newer user instruction conflicts with an earlier one, follow the newer instruction.
- Preserve earlier instructions that do not conflict.
- Higher-priority developer or system instructions remain binding.Mid-Conversation Instruction Updates
When instructions change, make updates explicit, limited, and localized:
For the next response only:
<task_update>
For the next response only:
- Do not complete the task.
- Only produce a plan.
- Keep it to 5 bullets.
All earlier instructions still apply unless they conflict with this update.
</task_update>When the task itself changes:
<task_update>
The task has changed.
Previous task: complete the workflow.
Current task: review the workflow and identify risks only.
Rules for this turn:
- Do not execute actions.
- Do not call destructive tools.
- Return exactly:
1. Main risks
2. Missing information
3. Recommended next step
</task_update>Phase Parameter
Starting with GPT-5.4, long or tool-intensive Responses workflows use phase values to distinguish intermediate updates from the final answer (see the prompt-guidance guide for the full reference):
If manually replaying assistant items:
- Preserve assistant `phase` values exactly.
- Use `phase: "commentary"` for intermediate user-visible updates.
- Use `phase: "final_answer"` for the completed answer.
- Do not add `phase` to user messages.If using previous_response_id, the API automatically preserves prior assistant state.
Prompt Structure Template
Role: [1-2 sentences defining the model's function, context, and work]
# Personality
[Tone, manner, collaboration style]
# Goal
[User-visible result]
# Success criteria
[Conditions that must be met before the final answer]
# Constraints
[Policy, safety, business, evidence, side-effect limits]
# Output
[Sections, length, tone]
# Stop rules
[When to retry, degrade, abandon, ask, or stop]Keep each block brief and only add detail where it would actually change behavior.
Verbosity and Format Output
Control output length via text.verbosity — medium is the API default, low for more concise responses (see the prompt-guidance guide for the full verbosity parameter reference).
Format principles:
- Plain paragraphs for normal conversation, explanations, reports, and technical writing — the default format
- Titles, bold, and lists only when comparisons, rankings, or scanning are needed
- Follow explicit user preferences for conciseness or specific formats
When editing or rewriting, tell the model what to preserve while improving:
Preserve the requested artifact, length, structure, and genre first. Quietly improve clarity, flow, and correctness. Do not add new claims, extra sections, or a more promotional tone unless explicitly requested.Specify audience and length explicitly:
Write for a senior business audience. Keep the answer under 400 words. Use short paragraphs and only include bullets when they improve scannability. Prioritize the conclusion first, then the reasoning, then caveats.