GPT-5.5 Best Practices

Simi included in AI

2026-05-01 3232 words 16 minutes

Contents

What Is GPT-5.5

GPT-5.5 raises the baseline for complex production workflows. It is suited for: coding scenarios, heavy tool-calling agents, grounded assistants, long-context retrieval, product-spec-to-plan workflows, and customer-service scenarios that demand execution quality and polished responses.

Core principle: treat it as a new model family that needs re-tuning — not a drop-in replacement for GPT-4 or GPT-5.2. Start from the minimum viable prompt, then tune reasoning effort, verbosity, tool descriptions, and output format based on representative samples.

What’s New

More Efficient Reasoning

GPT-5.5 achieves the same results as prior models using fewer reasoning tokens, even at identical reasoning effort level. This advantage compounds in complex, heavy tool-calling, multi-step workflows.

Better Outcome-First Execution

GPT-5.5 is better at starting from a clear goal: maintaining constraints, translating product intent into concrete next steps. Describe the expected outcome, success criteria, allowed side effects, evidence rules, and output shape. Do not use step-by-step process instructions to constrain it — unless the path itself is a hard product requirement. This is exactly the outcome-first prompt pattern OpenAI recommends.

More Precise Tool Selection

GPT-5.5 is especially useful on large tool surfaces, multi-step service flows, and long-running agent tasks. Tool selection and parameter usage are both more precise, especially when your prompt includes explicit tool-calling rules.

More Polished but More Direct

GPT-5.5 typically produces warmer, more readable answers with less prompt scaffolding. But it is also more direct — beneficial for many production workflows, but requiring explicit personality definition for customer-facing or conversational products.

Behavioral Changes

Reasoning Effort Default Is Now Medium

GPT-5.5’s reasoning_effort default is medium. Each level’s meaning (see OpenAI’s latest-model guide for the full parameter reference):

none: Latency-sensitive scenarios that don’t need reasoning or multi-step tool chains — lightweight voice turns, quick information retrieval, classification
low: Latency-sensitive but intelligence still matters — start evaluation here
medium: Best balance of quality, reliability, latency, and cost
high: Complex agent tasks requiring hard reasoning; latency is secondary
xhigh: Hardest async agent tasks, or evals testing the model’s intelligence ceiling

Higher reasoning effort isn’t automatically better. If the task has conflicting instructions, weak stop conditions, or open-ended tool access, higher effort leads to overthinking, unnecessary searching, or degraded output quality. Upgrade only when evals prove quality improvement.

Image Inputs Preserve More Visual Detail

GPT-5.5 updated the default handling of image inputs to preserve more visual detail:

auto (default): No upscaling, max 10,240,000 pixels or 6,000px dimension limit
high: No upscaling, max 2,500,000 pixels or 2,048px dimension limit
low: Prioritizes context efficiency, more aggressively downsamples above 512px
original: Preserves original size, no scaling — suitable for tasks requiring visual precision, especially computer use, localization, OCR, and tasks requiring click accuracy

If your workflow relies on visual precision, explicitly specify the image_detail level in the prompt or integration layer rather than relying on auto. The model-side defaults are documented in the latest-model guide.

Default Style Is More Concise and Direct

GPT-5.5’s default style is efficient, direct, and task-oriented. Useful for production systems, but customer-service or conversational products need explicit personality, warmth, rationale, and format definition.

Use text.verbosity to control this: medium is the default, low is a good starting point for more concise replies (see OpenAI’s prompt-guidance guide for the full verbosity parameter reference).

Coding Workflows Need Stronger Orchestration

GPT-5.5 is better suited for complex coding tasks requiring planning, tool calls, codebase navigation, verification, and multi-step execution. For coding agents, be explicit about: reuse strategy, sub-agent delegation, test expectations, acceptance criteria, and when to proceed versus when to ask for help.

GPT-5.4 Strengths and Weaknesses

GPT-5.4 is the current mainstream choice for production-grade assistants and performs especially well in these scenarios:

Strengths

Personality consistency: Less style drift in long answers
Agentic robustness: More likely to persist through multi-step work, retries, and agent loops
Evidence synthesis: Reliable in long-context or multi-tool workflows
Modular instruction following: Skill-based and block-structured prompts work better with clear output contracts
Long document analysis: Large documents, dirty data, multi-document inputs
Parallel tool calling: Maintains accuracy during batch or parallel tool calls
Excel/financial workflows: Strong instruction following, format fidelity, and self-verification

Weaknesses: These Scenarios Still Need Explicit Prompts

Low-context tool routing: Tool selection may be inaccurate early in a session when context is still thin
Dependency-aware workflows: Need explicit prerequisite and downstream step checking
Reasoning effort selection: Higher effort doesn’t automatically mean better — it depends on task characteristics, not intuition
Research tasks: Require strict source collection and consistent citations
High-risk operations: Irreversible or high-impact actions need pre-execution verification
Coding agents: Tool boundaries must be clear

Start from the minimum viable prompt and only add new instruction blocks after validating with evals.

Writing Outcome-First Prompts

Describe the Destination, Not Every Step Along the Way

This is the most important prompt principle for GPT-5.5.

❌ Don’t do this (process-heavy):

        
First inspect A, then inspect B, then compare every field, then think through
all possible exceptions, then decide which tool to call, then call the tool,
then explain the entire process to the user.

✅ Do this (outcome-first):

        
        
        
    
Resolve the customer's issue end to end.
Success means:
- the eligibility decision is made from the available policy and account data
- any allowed action is completed before responding
- the final answer includes completed_actions, customer_message, and blockers
- if evidence is missing, ask for the smallest missing field

Tell the model what “a good result looks like” and let it choose an efficient path to get there.

Don’t Copy Old Prompt Stacks During Migration

Many prompts were designed for earlier models: detailed steps, absolute directives (ALWAYS/NEVER/must/only). Earlier models needed these constraints to stay on track. GPT-5.5 does not. Migrating old prompts directly is the most common mistake.

Do not use absolute directives to control model behavior unless the directive describes a true invariant (safety rules, required fields, actions that must never happen). For decisions like “when to search, when to ask, when to use a tool,” prefer decision rules over absolute directives.

Personality and Collaboration Style

For customer service, coaching, and conversational products, explicitly define two dimensions:

Personality: tone, warmth, directness, formality, humor
Collaboration style: when to ask, when to assume, when to be proactive, when to check, when to acknowledge uncertainty

Keep both brief. Personality defines the user experience; collaboration style defines task behavior. Neither substitutes for clear goals, success criteria, tool rules, or stop conditions.

Template A: Steady Task-Oriented

        
# Personality
You are a capable collaborator: approachable, steady, and direct. Assume the user is competent and acting in good faith, and respond with patience, respect, and practical helpfulness.

Prefer making progress over stopping for clarification when the request is already clear enough to attempt. Use context and reasonable assumptions to move forward. Ask for clarification only when the missing information would materially change the answer or create meaningful risk, and keep any question narrow.

Stay concise without becoming curt. Give enough context for the user to understand and trust the answer, then stop. Use examples, comparisons, or simple analogies when they make the point easier to grasp. When correcting the user or disagreeing, be candid but constructive. When an error is pointed out, acknowledge it plainly and focus on fixing it.

Match the user's tone within professional bounds. Avoid emojis and profanity by default, unless the user explicitly asks for that style or has clearly established it as appropriate for the conversation.

Template B: Expressive Collaborative

        
# Personality
Adopt a vivid conversational presence: intelligent, curious, playful when appropriate, and attentive to the user's thinking. Ask good questions when the problem is blurry, then become decisive once there is enough context.

Be warm, collaborative, and polished. Conversation should feel easy and alive, but not chatty for its own sake. Offer a real point of view rather than merely mirroring the user, while staying responsive to their goals and constraints.

Be thoughtful and grounded when the task calls for synthesis or advice. State a clear recommendation when you have enough context, explain important tradeoffs, and name uncertainty without becoming evasive.

Streaming Output Preamble

GPT-5.5 may spend time reasoning, planning, or preparing tool calls before visible text appears. For multi-step or tool-intensive tasks, add a preamble before the actual content:

        
Before any tool calls for a multi-step task, send a short user-visible update that acknowledges the request and states the first step. Keep it to one or two sentences.

Coding agents can be more explicit:

        
You must always start with an intermediary update before any content in the analysis channel if the task will require calling tools. The user update should acknowledge the request and explain your first step.

This preamble doesn’t change the task itself but significantly improves perceived responsiveness in streaming scenarios.

Stop Rules

Give the model explicit stop rules:

        
Resolve the user query in the fewest useful tool loops, but do not let loop minimization outrank correctness, accessible fallback evidence, calculations, or required citation tags for factual claims.

After each result, ask: "Can I answer the user's core request now with useful evidence and citations for the factual claims?" If yes, answer.

When evidence is missing:

        
Use the minimum evidence sufficient to answer correctly, cite it precisely, then stop.

Grounding and Citations

For tasks requiring factual citations, citation behavior should be part of the prompt. Define what needs support, what counts as sufficient evidence, and how the model behaves when evidence is absent. Missing evidence does not mean answering “I don’t know.”

citation_rules

        
- Only cite sources retrieved in the current workflow.
- Never fabricate citations, URLs, IDs, or quote spans.
- Use exactly the citation format required by the host application.
- Attach citations to the specific claims they support, not only at the end.

grounding_rules

        
- Base claims only on provided context or tool outputs.
- If sources conflict, state the conflict explicitly and attribute each side.
- If the context is insufficient or irrelevant, narrow the answer or say you cannot support the claim.
- If a statement is an inference rather than a directly supported fact, label it as an inference.

Retrieval Budgets

A retrieval budget is a stop rule for search — telling the model when enough is enough.

Worth retrieving again:

Top results don’t answer the core question
Key facts, parameters, dates, IDs, or sources are missing
The user explicitly asks for exhaustive coverage
Need to read a specific document, URL, email, meeting record, or code
The answer will contain important factual claims without source support

Don’t retrieve again:

Just to improve wording
To add non-essential details
To support statements that could safely be written more generally

Guardrails for Creative Tasks

For slides, promotional copy, client summaries, talk tracks, and other creative or generative requests, explicitly distinguish what must be sourced:

        
For creative or generative requests such as slides, leadership blurbs, outbound copy, summaries for sharing, talk tracks, or narrative framing, distinguish source-backed facts from creative wording.

- Use retrieved or provided facts for concrete product, customer, metric, roadmap, date, capability, and competitive claims, and cite those claims.
- Do not invent specific names, first-party data claims, metrics, roadmap status, customer outcomes, or product capabilities to make the draft sound stronger.
- If there is little or no citable support, write a useful generic draft with placeholders or clearly labeled assumptions rather than unsupported specifics.

Frontend Engineering and UI Quality

For frontend workflows, include in the prompt: product context, user context, design system consistency, above-the-fold usability, familiar controls, expected states, responsive behavior, and common generative UI defects to avoid:

Generic heroes
Nested cards
Decorative gradients
Visible instructional text
Broken layouts

Having the Model Check Its Own Work

Programming Tasks

        
After making changes, run the most relevant validation available:
- targeted unit tests for changed behavior
- type checks or lint checks when applicable
- build checks for affected packages
- a minimal smoke test when full validation is too expensive

If validation cannot be run, explain why and describe the next best check.

Visual Artifacts

        
Render the artifact before finalizing. Inspect the rendered output for layout, clipping, spacing, missing content, and visual consistency. Revise until the rendered output matches the requirements.

Engineering and Planning Tasks

For implementation plans, include:

Requirements and the corresponding solution for each
Resources, files, APIs, or systems involved
Relevant state transitions or data flows
Validation commands or checks
Failure behavior
Privacy and security considerations
Open questions that materially affect implementation

GPT-5.4 Tool Use Strategies

Pitfall 1: Inaccurate Tool Routing with Low Context

Early in a session when context is still thin, tool selection may be inaccurate:

        
tool_persistence_rules
- Use tools whenever they materially improve correctness, completeness, or grounding.
- Do not stop early when another tool call is likely to materially improve correctness or completeness.
- Keep calling tools until: (1) the task is complete, AND (2) verification passes.
- If a tool returns empty or partial results, retry with a different strategy.

        
dependency_checks
- Before taking an action, check whether prerequisite discovery, lookup, or memory retrieval steps are required.
- Do not skip prerequisite steps just because the intended final action seems obvious.
- If the task depends on the output of a prior step, resolve that dependency first.

Pitfall 2: Finishing Long Tasks Early

The model may consider the task complete because of batch omissions or empty retrieval results:

        
        
        
    
completeness_contract
- Treat the task as incomplete until all requested items are covered or explicitly marked [blocked].
- Keep an internal checklist of required deliverables.
- For lists, batches, or paginated results:
  - determine expected scope when possible
  - track processed items or pages
  - confirm coverage before finalizing
- If any item is blocked by missing data, mark it [blocked] and state exactly what is missing.

Empty Result Recovery

When retrieval returns empty, partial, or clearly too narrow results:

        
        
        
    
empty_result_recovery
- Do not immediately conclude that no results exist.
- Try at least one or two fallback strategies:
  - alternate query wording
  - broader filters
  - a prerequisite lookup
  - an alternate source or tool
- Only then report that no results were found, along with what you tried.

Verification Loop Before High-Risk Actions

After a workflow appears complete, before returning the answer or executing an irreversible action, add a lightweight verification step:

        
        
        
    
verification_loop
Before finalizing:
- Check correctness: does the output satisfy every requirement?
- Check grounding: are factual claims backed by the provided context or tool outputs?
- Check formatting: does the output match the requested schema or style?
- Check safety and irreversibility: if the next step has external side effects, ask permission first.

No Speculation When Context Is Missing

        
missing_context_gating
- If required context is missing, do NOT guess.
- Prefer the appropriate lookup tool when the missing context is retrievable; ask a minimal clarifying question only when it is not.
- If you must proceed, label assumptions explicitly and choose a reversible action.

Agent Execution Safety Framework

For agents that actively execute actions, add an execution framework:

        
action_safety
- Pre-flight: summarize the intended action and parameters in 1-2 lines.
- Execute via tool.
- Post-flight: confirm the outcome and any validation that was performed.

Tool Calling: Concurrent vs. Sequential

        
parallel_tool_calling
- When multiple retrieval or lookup steps are independent, prefer parallel tool calls to reduce wall-clock time.
- Do not parallelize steps that have prerequisite dependencies or where one result determines the next action.
- After parallel retrieval, pause to synthesize the results before making more calls.
- Prefer selective parallelism: parallelize independent evidence gathering, not speculative or redundant tool use.

Core principle: parallelize independent tasks, serialize dependent tasks.

Output Contract and Verbosity Control

GPT-5.4 achieves precise control over output length and structure through the cooperation of output_contract with verbosity parameters (see the prompt-guidance guide for the full reference):

        
        
        
    
<output_contract>
- Return exactly the sections requested, in the requested order.
- If the prompt defines a preamble, analysis block, or working section, do not treat it as extra output.
- Apply length limits only to the section they are intended for.
- If a format is required (JSON, Markdown, SQL, XML), output only that format.
</output_contract>

<verbosity_controls>
- Prefer concise, information-dense writing.
- Avoid repeating the user's request.
- Keep progress updates brief.
- Do not shorten the answer so aggressively that required evidence, reasoning, or completion checks are omitted.
</verbosity_controls>

Follow-Through Strategies

Users frequently change the task, format, or tone mid-conversation. Define clear rules:

        
        
        
    
default_follow_through_policy
- If the user's intent is clear and the next step is reversible and low-risk, proceed without asking.
- Ask permission only if the next step is:
  (a) irreversible,
  (b) has external side effects (sending, purchasing, deleting, writing to production), or
  (c) requires missing sensitive information or a choice that would materially change the outcome.
- If proceeding, briefly state what you did and what remains optional.

        
        
        
    
instruction_priority
- User instructions override default style, tone, formatting, and initiative preferences.
- Safety, honesty, privacy, and permission constraints do not yield.
- If a newer user instruction conflicts with an earlier one, follow the newer instruction.
- Preserve earlier instructions that do not conflict.
- Higher-priority developer or system instructions remain binding.

Mid-Conversation Instruction Updates

When instructions change, make updates explicit, limited, and localized:

For the next response only:

        
        
        
    
<task_update>
For the next response only:
- Do not complete the task.
- Only produce a plan.
- Keep it to 5 bullets.
All earlier instructions still apply unless they conflict with this update.
</task_update>

When the task itself changes:

        
        
        
    
<task_update>
The task has changed.
Previous task: complete the workflow.
Current task: review the workflow and identify risks only.
Rules for this turn:
- Do not execute actions.
- Do not call destructive tools.
- Return exactly:
  1. Main risks
  2. Missing information
  3. Recommended next step
</task_update>

Phase Parameter

Starting with GPT-5.4, long or tool-intensive Responses workflows use phase values to distinguish intermediate updates from the final answer (see the prompt-guidance guide for the full reference):

        
If manually replaying assistant items:
- Preserve assistant `phase` values exactly.
- Use `phase: "commentary"` for intermediate user-visible updates.
- Use `phase: "final_answer"` for the completed answer.
- Do not add `phase` to user messages.

If using previous_response_id, the API automatically preserves prior assistant state.

Prompt Structure Template

        
Role: [1-2 sentences defining the model's function, context, and work]

# Personality
[Tone, manner, collaboration style]

# Goal
[User-visible result]

# Success criteria
[Conditions that must be met before the final answer]

# Constraints
[Policy, safety, business, evidence, side-effect limits]

# Output
[Sections, length, tone]

# Stop rules
[When to retry, degrade, abandon, ask, or stop]

Keep each block brief and only add detail where it would actually change behavior.

Verbosity and Format Output

Control output length via text.verbosity — medium is the API default, low for more concise responses (see the prompt-guidance guide for the full verbosity parameter reference).

Format principles:

Plain paragraphs for normal conversation, explanations, reports, and technical writing — the default format
Titles, bold, and lists only when comparisons, rankings, or scanning are needed
Follow explicit user preferences for conciseness or specific formats

When editing or rewriting, tell the model what to preserve while improving:

        
Preserve the requested artifact, length, structure, and genre first. Quietly improve clarity, flow, and correctness. Do not add new claims, extra sections, or a more promotional tone unless explicitly requested.

Specify audience and length explicitly:

        
Write for a senior business audience. Keep the answer under 400 words. Use short paragraphs and only include bullets when they improve scannability. Prioritize the conclusion first, then the reasoning, then caveats.