Contents

Claude 3 After One Month: Sonnet Is the Real Coding Model

Bottom Line First

When Claude 3 dropped, the internet exploded with benchmarks. But most coverage was just发布会摘要—nobody was saying what it’s actually like to use.

I spent about $200 testing all three Claude 3 models (Sonnet, Haiku, Opus) extensively. The conclusion surprised me:

Sonnet is the coding champion. Not Opus. Sonnet.

Why Not Opus

Opus is the most expensive at $15/M input tokens. But programming tasks don’t need Opus.

Real test: had all three refactor a 500-line messy Python script—add type hints, docstrings, split into modules.

Model Time Score (1-10) Cost
Opus 18s 9 $0.15
Sonnet 12s 9 $0.003
Haiku 8s 6 $0.00025

Sonnet and Opus output quality were practically identical, but cost was 1/50th.

Opus’s real strengths are ultra-long context (200k tokens) and complex reasoning. For everyday coding tasks, Sonnet is more than enough—and way faster.

Where Sonnet Excels

1. Code Understanding

Tested with a classic: take a Django view function with no comments, no type annotations, have it refactored with tests.

Sonnet understood the function’s intent correctly, added type annotations, and caught a potential SQL injection vulnerability. Opus caught it too—but Sonnet was faster.

# Original code (after Claude 3 Sonnet analysis)
# This is a user profile update view
# Issue 1: SQL concatenation has injection risk
# Issue 2: Missing ownership check
# Issue 3: No CSRF verification

@login_required
def update_profile(request):
    ...

2. Test Generation

Had Sonnet generate test cases for a FastAPI endpoint:

# Sonnet-generated tests
class TestUserEndpoint:
    def test_get_user_success(self):
        # Happy path
        response = client.get("/users/1")
        assert response.status_code == 200
        assert "name" in response.json()
    
    def test_get_user_not_found(self):
        # 404 path
        response = client.get("/users/999")
        assert response.status_code == 404
    
    def test_get_user_unauthorized(self):
        # Unauthorized
        response = client.get("/users/1", headers={})
        assert response.status_code == 401

It covered happy path and edge cases, used pytest fixtures correctly.

3. Code Explanation

Threw a gnarly regex at it:

Explain this "looks like gibberish" regex:

Pattern: (?<!\.)(?<![A-Z][a-z])(?<!\b\d{3})(?<!\d{3}-\d{2})(?<!\d{3}-\d{2}-)\b\d{3}-\d{2}-\d{4}\b

Sonnet: This matches US SSN (Social Security Numbers)...

It wasn’t intimidated, correctly identified it as SSN matching, and pointed out several potential issues with the original regex.

Haiku’s Real Use Case

Haiku isn’t for writing code—it’s for querying code.

$0.00025/M tokens, cheap enough to use freely. Throw a 500-line file at it, ask “which function most likely has bugs,” it gives you an answer in 8 seconds, 70% accuracy.

For quick code reviews and exploring unfamiliar codebases, Haiku is a steal.

vs GPT-4

Everyone’s burning question. My real experience:

Scenario GPT-4 Claude 3 Sonnet
Write new code Strong Strong
Read existing code Medium Strong
Bug finding Medium Strong
Test generation Strong Strong
Code explanation Strong Strong
Speed Slow 5x faster
Cost High 10x lower

GPT-4 is slightly better at writing new code, but significantly weaker at reading code and finding bugs. For teams doing lots of maintenance and understanding existing codebases, Sonnet’s price/performance is way better.

Actual Workflow

My current setup:

  • Sonnet: primary—writing code, code review, test generation
  • Haiku: quick code exploration, unfamiliar codebase investigation
  • GPT-4: only when Sonnet struggles (happens about 1-2 times per week)

Conclusion

Among Claude 3’s three models, Sonnet is the sweet spot for programming. Opus is overkill and overpriced, Haiku is cheap but only handles simple tasks.

If you’re evaluating: use Sonnet for everyday coding. It’ll remain the standard for coding assistance for a long time.

Oh, and the first draft of this article was reviewed by Sonnet.