GPT Best Practices

Simi included in AI

2023-06-09 2445 words 12 minutes

/posts/gpt-best-practices/featured-image.png

Contents

Most teams fail with LLMs not because the model isn’t powerful enough, but because they’re still treating it like a search box or an infallible consultant.

The reality is simpler and more honest: models are probabilistic assistants. They’re fast, capable, and prone to confident mistakes. What you put in—task boundaries, input materials, verification mechanisms—determines the ceiling of what comes out.

This isn’t about the latest buttons on any particular product or chasing model naming trends. It’s about one thing that matters more: how to make GPT-style models controllable, verifiable, and actually useful in real work.

First: Models Aren’t Oracles, They’re Systems with Error Distributions

If you treat the model as “the person who knows the answer,” you’ll be constantly disappointed. If you treat it as “a system that generates the most plausible-sounding answer based on context,” many practices suddenly make sense.

This reframing leads directly to working principles:

Don’t ask “can it answer?"—ask “how will I know if it’s wrong?”
Don’t chase a single correct first answer—design iterative correction workflows
Don’t place the model in situations requiring hard certainty without human oversight
If something can be handled by rules, scripts, search, or databases, don’t make the model guess

More specifically: models are good at accelerating cognitive work, not at bearing responsibility for final decisions.

1. Define the Work Before You Ask for “Something Written”

Most failures come from vague task definitions, not short prompts.

“Help me write a proposal” is almost impossible to execute. Proposal for whom? Solving what problem? What constraints? What shouldn’t be included? What format—email, PRD, SQL migration plan, or five decision points for a meeting?

A workable task needs at least four things:

Objective: What problem does this output solve?
Audience: Boss, client, engineer, or your own draft?
Constraints: Time, length, tone, compliance, tech stack, boundaries
Completion criteria: What counts as done, what doesn’t

Here’s a concrete example:

        
        
        
    
You need to write an interface change notice for front-end and back-end alignment.

Objective:
- Explain why user profile editing changed from full replacement to field-level patch
- List compatibility impact and migration steps

Audience:
- Frontend engineers, backend engineers, QA

Constraints:
- No background stories or marketing language
- Based on existing REST API
- Must distinguish breaking vs non-breaking changes
- Under 800 characters

Completion criteria:
- Frontend and backend can start coding after reading
- QA can derive regression test cases
- If information is insufficient, list what's missing first—don't fabricate

The closer your task is to an “executable work order,” the more the output looks like work; the closer to small talk, the more it reads like filler.

2. Context Isn’t Garnish, It’s the Main Ingredient

The most common model problem isn’t inability to write—it’s not knowing your specific world.

Things you take for granted, the model doesn’t know:

Your company’s terminology
Current project phase
Historical decisions
Existing tech stack
Organizational dynamics
Who the users are, who shouldn’t be affected
Which conclusions are already settled, which are still open

Three Layers of Context to Always Provide

Business context: What’s the scenario? What metric are we optimizing? Where do constraints come from?

“Build a login page” vs. “Redesign the login flow for a financial admin panel, must support SMS 2FA, enterprise SSO, and audit logging”—not the same task at all.

Material context: Give it raw materials, not your二手 summaries. Common materials:

Existing docs
Code snippets
Meeting notes
User feedback
Log samples
Schema definitions
API responses
Error traces

The closer to the actual work environment, the less the model drifts.

Output context: What will you do with this content? Post to Slack, write a commit message, present to leadership, draft contract terms, write test cases, or reply to support tickets? Purpose changes structure.

3. When You Have Source Material, Use It Instead of Making the Model Guess

Modern models are usually more reliable at “organizing content you provide” than “generating complete facts from memory.”

For tasks involving internal knowledge, latest policies, technical implementation details, compliance requirements, or anything requiring cited sources—do grounding with credible materials.

Simple rule:

If any sentence in the answer requires you to take factual responsibility, provide source material.

Good approaches

Give meeting notes, let it extract decisions and action items
Give API docs plus actual responses, let it write integration guides
Give 20 user feedback items, let it categorize issues
Give contract clauses, let it flag risk points
Give error logs, let it propose hypotheses ranked by likelihood

Bad approaches

“You know our company’s membership rules, right?”
“Update this according to the latest regulations”
“Write a module matching this system’s code style”—without providing the code
“Summarize recent industry changes”—without specifying time range or sources

A practical template

        
        
        
    
Answer only based on materials I provide.
If materials don't support a conclusion, explicitly write "Insufficient information."
Do not补充 your own speculative guesses.
When citing specific sources, note the source number at sentence end.

Materials:
[1] ...
[2] ...
[3] ...

This isn’t formalism. Much “hallucination” comes from you assuming it should know while providing no evidence.

4. Prompts Need Structure, Not Walls of Text

The worst prompts are几十行 lumped into one paragraph—goals, constraints, background, exceptions, and format all mixed together. Hard for humans to parse, even harder for models not to miss conditions.

A useful basic structure:

        
        
        
    
Task:
Background:
Input materials:
Constraints:
Output format:
Acceptance criteria:

For complex tasks, add:

What NOT to do
What to do when information is insufficient
Analysis first or direct output
Whether to list risks and assumptions
Whether to distinguish facts, inferences, and recommendations

5. Prefer “Checkable” Over “Looks Good”

Many outputs are unusable not because they’re poorly written, but because they’re impossible to verify.

“Write something more professional” usually just produces better-sounding nonsense. Demanding outputs in formats that can be audited, diffed, and handed off has much higher value.

Formats to prefer

Tables
JSON / YAML
Checklists
Step-by-step plans
Risk lists
Assumption lists
Decision records
Test cases
Three-column structure: Conclusion / Evidence / Uncertainty

Instead of “summarize this document,” try:

        
Output a three-column table:
- Conclusion
- Supporting evidence from source
- Remaining uncertainties

If a conclusion lacks explicit support, don't write it.

Or:

        
        
        
    
Output JSON with fixed fields:
{
  "problem": "",
  "root_causes": [],
  "constraints": [],
  "proposed_actions": [],
  "open_questions": []
}

Formatting isn’t about aesthetics—it’s about making output machine-processable, human-auditable, and reusable.

6. Break Complex Tasks Down—Don’t Expect One-Shot Perfection

Models perform well on single-step tasks. Cramming “understand the problem, research, judge, weigh options, organize, generate final draft” into one call causes quality to degrade noticeably.

A more reliable approach: pipeline decomposition.

Step 1: Understanding—have it restate the goal, list missing information, identify risks

Step 2: Process materials—extract structure, summarize facts, don’t conclude yet

Step 3: Generate candidates—give 2-3 directions with tradeoffs, not final drafts

Step 4: Converge—you add constraints, eliminate unsuitable directions, have it continue

Step 5: Final format—write the document, email, PR, script, or page copy

Why this works: each step becomes checkable. You catch misunderstanding before it propagates through the entire output.

7. Tools Handle确定性; Models Handle Orchestration

Mature usage isn’t about the model doing everything—it’s about model plus tools.

Model strengths: explaining, summarizing, generating candidates, orchestrating flow

Model weaknesses: anything requiring hard certainty

Typical division:

Search/retrieval → find materials
DB/query tools → get real data, don’t imagine SQL results
Code execution → calculate, run scripts, verify regex, process CSVs
Linters/type checkers/test runners → final裁决
Browser/scrapers/APIs → latest pages, real DOM structures

Principle: model suggests “how,” tools verify “whether correct.”

8. Iteration Is Normal—First Drafts Are Just Drafts

High efficiency users treat the model as an iteration object: you give feedback, it rewrites quickly.

Bad feedback:

Not quite right, try again
Too AI-like
Not professional enough

Good feedback:

Remove the opening buildup, start directly from problem definition
Section two reads too much like marketing—rewrite in engineering decision tone
Change “recommendations” to actionable steps, each with responsible party
Only keep cost-related comparisons, remove brand value talk
Examples too generic—use B2B SaaS scenario

Models respond well to specific, local, structured feedback. They respond to vague emotion by surface-level迎合.

9. Let the Model Say “I Don’t Know”

Many errors are induced by usage patterns.

If your prompt defaults to requiring complete answers that sound like an expert, the model fills gaps with confident guesses.

Explicitly give it a legitimate path to uncertainty:

        
If materials cannot support a definite conclusion:
- State what remains uncertain
- Explain what information is missing
- Give minimum verification steps to resolve
Don't write speculation as fact.

This matters especially in:

Technical troubleshooting
Legal/compliance matters
Medical/financial decisions
Business forecasting
Root cause analysis
Industry analysis requiring factual citations

Mature output separates facts, inferences, assumptions, and recommendations—not every sentence tries to sound certain.

10. Build a Small Evaluation Set—Don’t Rely on “Feels Right This Time”

If you’re using models for repeated tasks—customer replies, requirement summarization, code review assistance, article summaries, lead classification—don’t keep judging quality by intuition.

You need a small evaluation set, even if it’s only 20 samples.

Three types of evaluation samples

1. Standard cases: Most common, most expected. Tests baseline stability.

2. Edge cases: Incomplete info, messy phrasing, overly long input, mixed intents. Tests robustness.

3. Negative cases: Intentionally misleading, conflicting information, unsolvable conditions. Tests whether it fabricates or appropriately flags problems.

What to evaluate

Depends on the task, but typically:

Accuracy
Completeness
Format compliance
Citation correctness
Fabrication rate
Constraint handling
Appropriate refusal/uncertainty flagging

Practical tip: keep the same sample set fixed. Every prompt tweak, model change, or tool chain update—run them all through. You’ll quickly discover that many “feels smarter” changes don’t actually score better on real samples.

Without evaluation, all optimization is just personal bias.

11. Common Failure Patterns—Prevent Upfront

1. Task too broad, output necessarily scatters

“Analyze how we can grow our product” has no boundaries. The model只好 assembles generic business fluff.

Fix: Narrow to specific goals, audiences, timeframes, constraints, and deliverables.

2. Insufficient materials, demanding conclusions

Only a few user comments, asking for “core user persona.” No logs or reproduction steps, asking for “root cause.”

Fix: Let it first list what’s missing rather than forcing premature conclusions.

3. Expecting the model to make final decisions

Models can help organize options, add perspectives, summarize tradeoffs—but can’t bear responsibility for decisions requiring accountability.

Fix: Position models as “decision support” not “automated decision-making.”

4. Chasing eloquence at the expense of information density

When output “doesn’t sound human enough,” pushing for “more natural, more vivid, more polished” often dilutes the actual information.

Fix: Accuracy first, style second. Work documents prioritize clarity, not beauty.

5. Treating long context as a universal solution

Dumping materials doesn’t mean the model automatically grasps key points. Too much information, too chaotic, and it still misses.

Fix: Pre-organize materials, or explicitly tell it which parts to prioritize.

6. Using models for what should be scripted

Format conversion, rule validation, field mapping, batch renaming, fixed template filling—if rules are clear enough, scripts are more stable and cheaper.

Fix: Automate first; let models only handle what rules can’t cover.

12. When NOT to Use Models

This part matters more than “how to use.”

Don’t use models alone in these situations

High-determinism calculations: Amounts, tax rates, quotas, statistical definitions, complex logic. Use programs.

High-risk professional judgments: Legal conclusions, medical advice, investment decisions, compliance determinations. Models can help organize materials, can’t replace signed expert responsibility.

Rule-based repetitive work: If you can write a script, SQL, or template engine, don’t add a probabilistic system for variance.

Data that can’t leave its domain: Sensitive data, customer privacy, core code, unreleased business info. Check boundaries, permissions, and anonymization first.

Before the organization is ready to bear error costs: If downstream assumes “model said it, so it’s final,” don’t wire it into critical workflows.

Bottom line: Models work best for verifiable, reversible, correctable work; worst for irreversible, un-auditable, non-accountable decisions.

13. Practical Prompt Templates

These aren’t answers—they’re better than “help me with this.”

1. Writing: Structure First, Draft Second

        
You're my writing assistant, but don't invent facts.

Task:
Write an internal team analysis based on materials I provide.

Requirements:
- First output article structure, not full draft
- One core judgment sentence per section
- Mark which judgments have material support, which need more evidence
- I'll confirm structure before you expand to full draft

2. Information Organization: Mandatory Attribution

        
        
        
    
Organize conclusions only from materials below.

Output format:
- Conclusion
- Source excerpt
- Risks or exceptions
- Uncertainties

Rules:
- Don't write conclusions without supporting evidence
- When sources conflict, list the conflict—don't force reconciliation

3. Requirement Clarification: Ask for Gaps First

        
I want to build a new feature. Don't give me solutions yet.

First, do two things:
1. Restate your understanding of the goal
2. List what information is still missing for a靠谱 solution

Requirements:
- Categorize by: users, scenario, constraints, data, technology, launch risks
- No generic recommendations

4. Code Collaboration: Plan First, Then Modify

        
You'll help me modify an existing project, but don't make large changes yet.

First output:
1. Your understanding of the requirement
2. Modules that might be affected
3. Minimum change plan
4. Risk points
5. Questions needing confirmation

After confirmation, provide specific modification suggestions.

5. Troubleshooting: Separate Facts from Hypotheses

        
Below are error logs and reproduction steps.
Don't assert root cause directly.

Output format:
- Known facts
- Three most likely hypotheses
- Verification method for each hypothesis
- Recommended investigation order

14. A More Realistic Workflow

If you genuinely want to integrate GPT into daily work, try this sequence:

Define task: Clear goal, audience, constraints, acceptance criteria
Prepare materials: Source docs, code, data, samples, screenshots
First-round clarification: Have model restate task and list gaps
Structured processing: Extract facts first, then generate candidates
Tool verification: Search, execute, test, query
Second-round revision: Directed rewrite based on issues
Human sign-off: Final judgment and accountability confirmation
Build evaluation set: Save typical samples as future benchmarks

This looks more complicated than “just ask,” but for anything that matters, it saves time. You stop wrestling with outputs that look polished but are unreliable.

Closing Thoughts

The real value of GPT-style models isn’t “thinking for you”—it’s making many low-speed, low-frequency, hard-to-start cognitive tasks into something you can quickly prototype, iterate, and compare.

But前提是 you accept an unsexy reality: it’s not magic, it’s a system; not an expert, it’s a tool; not an answer itself, but one step in an answer production line.

People who use it well aren’t necessarily the best at writing prompts—they’re clearest about task boundaries, best at providing materials, most willing to build verification mechanisms.

Don’t ask “can this replace me?”