Contents

2024 AI Coding: Pits I Fell Into and Lessons I Learned

Bottom Line First

At the start of 2024, I was skeptical about AI coding. By year end, I use it every day.

The biggest change wasn’t any single tool—it was AI coding going from “usable” to “actually good.”

Moments That Changed How I Work

1. Claude 3 Sonnet Released (February)

This was the turning point. Before, GPT-4 for coding assistance—$0.03/M tokens, fine for chatting, but writing actual code was lacking.

Sonnet at $0.003/M, 10x cheaper, but stronger for coding.

From that day, my API bill dropped 70%.

2. Cursor Went Mainstream (Mid-year)

Cursor wasn’t the first AI IDE, but it was the first one people actually migrated to.

I’d tried GitHub Copilot Workspace (which became Cursor)—mediocre experience. 2024 Cursor was a completely different product.

Cmd+K for inline AI editing is way more efficient than chat interfaces.

3. Claude Code Released (Mid-year)

Anthropic’s CLI tool.

Thought it was a gimmick at first. Used it for a week—it’s the strongest codebase analysis tool available.

200k tokens context, can throw entire codebases at it. No IDE needed, just terminal.

# This scenario Copilot and Cursor can't do
claude
> In this 50k-line codebase, which module most likely has issues?

# Claude Code actually analyzes and gives you an insightful answer

4. The Devin Reality

Devin announced as “the first AI software engineer,” $500/month.

Tried it for two months. Verdict:

  • Can do: small isolated tasks (write a script, automate a workflow)
  • Can’t do: multi-file changes requiring business context, architectural decisions

Devin works as “outsourced labor,” not a daily dev tool.

Pits I Fell Into

Pit 1: Over-relying on AI-generated Tests

AI-generated tests have high coverage, but test what AI thought the logic was, not the real intent.

# AI wrote tests, looks like it covers everything
def test_calculate_discount():
    assert calculate_discount(100, 10) == 90  # 10% off
    assert calculate_discount(100, 0) == 100  # no discount
    assert calculate_discount(0, 50) == 0  # zero amount
    
# But missed business rule: discount cannot exceed 50% of order_total
# AI didn't know this rule, tests passed, bug shipped

Lesson: review AI-written tests, especially edge cases.

Pit 2: AI Doesn’t Understand “Why”

Give AI a task, it delivers. Ask “why did you do it this way?"—it often gives a plausible-sounding but incorrect explanation.

# AI wrote this
def process_data(data):
    # AI: "Using dict instead of list here because it's faster for lookups"
    # Reality: only 100 items, list is fine
    # AI is making up reasons

Lesson: don’t ask AI to explain code—ask “what are the implications of this design decision.”

Pit 3: Context Loss

After coding with AI all day, realize the suggestions don’t match the project’s overall design.

AI only sees code in your current conversation, not the full picture.

Lesson: for complex tasks, have AI read the entire codebase first, then start modifying.

Real Numbers

My AI coding usage stats this year (estimates):

Task Type AI Success Rate Human Fix Needed
Write tests 85% 30%
Simple functions 95% 10%
Code refactoring 70% 40%
Complex architecture 30% 80%
Bug fixes 60% 50%
Code explanation 90% 5%

Conclusion: AI coding best for “do more, modify less” tasks. Complex architecture decisions still human.

2025 Predictions

Will Happen

  1. MCP (Model Context Protocol) becomes standard: standardized way for AI Agents to connect to external tools
  2. Cursor’s moat faces challenges: Copilot and Claude Code are catching up
  3. AI coding diverges: lightweight tools (completion, simple tasks) vs heavy tools (codebase analysis, complex refactoring)

Won’t Happen

  1. AI won’t replace programmers: but programmers who use AI will replace those who don’t
  2. Devin-style full-flow Agents won’t go mainstream: expensive and slow, good for specific scenarios only
  3. AI coding won’t eliminate bugs: may introduce harder-to-find ones instead

Conclusion

2024 AI coding’s significance: freed programmers from 50% of mechanical work.

Writing tests, simple functions, code explanation, routine refactoring—these used to consume huge portions of a programmer’s day. AI does them well enough.

But AI coding has limits: it doesn’t understand business logic, architecture, or “why.”

In 2025, the productivity gap between programmers who can use AI coding and those who can’t will widen further.