Claude 3 After One Month: Sonnet Is the Real Coding Model

Simi included in AI

2024-02-15 554 words 3 minutes

Contents

Bottom Line First

When Claude 3 dropped, the internet exploded with benchmarks. But most coverage was just发布会摘要—nobody was saying what it’s actually like to use.

I spent about $200 testing all three Claude 3 models (Sonnet, Haiku, Opus) extensively. The conclusion surprised me:

Sonnet is the coding champion. Not Opus. Sonnet.

Why Not Opus

Opus is the most expensive at $15/M input tokens. But programming tasks don’t need Opus.

Real test: had all three refactor a 500-line messy Python script—add type hints, docstrings, split into modules.

Model	Time	Score (1-10)	Cost
Opus	18s	9	$0.15
Sonnet	12s	9	$0.003
Haiku	8s	6	$0.00025

Sonnet and Opus output quality were practically identical, but cost was 1/50th.

Opus’s real strengths are ultra-long context (200k tokens) and complex reasoning. For everyday coding tasks, Sonnet is more than enough—and way faster.

Where Sonnet Excels

1. Code Understanding

Tested with a classic: take a Django view function with no comments, no type annotations, have it refactored with tests.

Sonnet understood the function’s intent correctly, added type annotations, and caught a potential SQL injection vulnerability. Opus caught it too—but Sonnet was faster.

        
        
        
    
# Original code (after Claude 3 Sonnet analysis)
# This is a user profile update view
# Issue 1: SQL concatenation has injection risk
# Issue 2: Missing ownership check
# Issue 3: No CSRF verification

@login_required
def update_profile(request):
    ...

2. Test Generation

Had Sonnet generate test cases for a FastAPI endpoint:

        
        
        
    
# Sonnet-generated tests
class TestUserEndpoint:
    def test_get_user_success(self):
        # Happy path
        response = client.get("/users/1")
        assert response.status_code == 200
        assert "name" in response.json()
    
    def test_get_user_not_found(self):
        # 404 path
        response = client.get("/users/999")
        assert response.status_code == 404
    
    def test_get_user_unauthorized(self):
        # Unauthorized
        response = client.get("/users/1", headers={})
        assert response.status_code == 401

It covered happy path and edge cases, used pytest fixtures correctly.

3. Code Explanation

Threw a gnarly regex at it:

Explain this "looks like gibberish" regex:

Pattern: (?<!\.)(?<![A-Z][a-z])(?<!\b\d{3})(?<!\d{3}-\d{2})(?<!\d{3}-\d{2}-)\b\d{3}-\d{2}-\d{4}\b

Sonnet: This matches US SSN (Social Security Numbers)...

It wasn’t intimidated, correctly identified it as SSN matching, and pointed out several potential issues with the original regex.

Haiku’s Real Use Case

Haiku isn’t for writing code—it’s for querying code.

$0.00025/M tokens, cheap enough to use freely. Throw a 500-line file at it, ask “which function most likely has bugs,” it gives you an answer in 8 seconds, 70% accuracy.

For quick code reviews and exploring unfamiliar codebases, Haiku is a steal.

vs GPT-4

Everyone’s burning question. My real experience:

Scenario	GPT-4	Claude 3 Sonnet
Write new code	Strong	Strong
Read existing code	Medium	Strong
Bug finding	Medium	Strong
Test generation	Strong	Strong
Code explanation	Strong	Strong
Speed	Slow	5x faster
Cost	High	10x lower

GPT-4 is slightly better at writing new code, but significantly weaker at reading code and finding bugs. For teams doing lots of maintenance and understanding existing codebases, Sonnet’s price/performance is way better.

Actual Workflow

My current setup:

Sonnet: primary—writing code, code review, test generation
Haiku: quick code exploration, unfamiliar codebase investigation
GPT-4: only when Sonnet struggles (happens about 1-2 times per week)

Conclusion

Among Claude 3’s three models, Sonnet is the sweet spot for programming. Opus is overkill and overpriced, Haiku is cheap but only handles simple tasks.

If you’re evaluating: use Sonnet for everyday coding. It’ll remain the standard for coding assistance for a long time.

Oh, and the first draft of this article was reviewed by Sonnet.