Claude 3 After One Month: Sonnet Is the Real Coding Model
Bottom Line First
When Claude 3 dropped, the internet exploded with benchmarks. But most coverage was just发布会摘要—nobody was saying what it’s actually like to use.
I spent about $200 testing all three Claude 3 models (Sonnet, Haiku, Opus) extensively. The conclusion surprised me:
Sonnet is the coding champion. Not Opus. Sonnet.
Why Not Opus
Opus is the most expensive at $15/M input tokens. But programming tasks don’t need Opus.
Real test: had all three refactor a 500-line messy Python script—add type hints, docstrings, split into modules.
| Model | Time | Score (1-10) | Cost |
|---|---|---|---|
| Opus | 18s | 9 | $0.15 |
| Sonnet | 12s | 9 | $0.003 |
| Haiku | 8s | 6 | $0.00025 |
Sonnet and Opus output quality were practically identical, but cost was 1/50th.
Opus’s real strengths are ultra-long context (200k tokens) and complex reasoning. For everyday coding tasks, Sonnet is more than enough—and way faster.
Where Sonnet Excels
1. Code Understanding
Tested with a classic: take a Django view function with no comments, no type annotations, have it refactored with tests.
Sonnet understood the function’s intent correctly, added type annotations, and caught a potential SQL injection vulnerability. Opus caught it too—but Sonnet was faster.
# Original code (after Claude 3 Sonnet analysis)
# This is a user profile update view
# Issue 1: SQL concatenation has injection risk
# Issue 2: Missing ownership check
# Issue 3: No CSRF verification
@login_required
def update_profile(request):
...2. Test Generation
Had Sonnet generate test cases for a FastAPI endpoint:
# Sonnet-generated tests
class TestUserEndpoint:
def test_get_user_success(self):
# Happy path
response = client.get("/users/1")
assert response.status_code == 200
assert "name" in response.json()
def test_get_user_not_found(self):
# 404 path
response = client.get("/users/999")
assert response.status_code == 404
def test_get_user_unauthorized(self):
# Unauthorized
response = client.get("/users/1", headers={})
assert response.status_code == 401It covered happy path and edge cases, used pytest fixtures correctly.
3. Code Explanation
Threw a gnarly regex at it:
Explain this "looks like gibberish" regex:
Pattern: (?<!\.)(?<![A-Z][a-z])(?<!\b\d{3})(?<!\d{3}-\d{2})(?<!\d{3}-\d{2}-)\b\d{3}-\d{2}-\d{4}\b
Sonnet: This matches US SSN (Social Security Numbers)...It wasn’t intimidated, correctly identified it as SSN matching, and pointed out several potential issues with the original regex.
Haiku’s Real Use Case
Haiku isn’t for writing code—it’s for querying code.
$0.00025/M tokens, cheap enough to use freely. Throw a 500-line file at it, ask “which function most likely has bugs,” it gives you an answer in 8 seconds, 70% accuracy.
For quick code reviews and exploring unfamiliar codebases, Haiku is a steal.
vs GPT-4
Everyone’s burning question. My real experience:
| Scenario | GPT-4 | Claude 3 Sonnet |
|---|---|---|
| Write new code | Strong | Strong |
| Read existing code | Medium | Strong |
| Bug finding | Medium | Strong |
| Test generation | Strong | Strong |
| Code explanation | Strong | Strong |
| Speed | Slow | 5x faster |
| Cost | High | 10x lower |
GPT-4 is slightly better at writing new code, but significantly weaker at reading code and finding bugs. For teams doing lots of maintenance and understanding existing codebases, Sonnet’s price/performance is way better.
Actual Workflow
My current setup:
- Sonnet: primary—writing code, code review, test generation
- Haiku: quick code exploration, unfamiliar codebase investigation
- GPT-4: only when Sonnet struggles (happens about 1-2 times per week)
Conclusion
Among Claude 3’s three models, Sonnet is the sweet spot for programming. Opus is overkill and overpriced, Haiku is cheap but only handles simple tasks.
If you’re evaluating: use Sonnet for everyday coding. It’ll remain the standard for coding assistance for a long time.
Oh, and the first draft of this article was reviewed by Sonnet.