Contents

AI-Assisted Code Review: Real Feedback After One Year

Context

Our team started systematically using AI for Code Review from mid-2024.

Specific setup: each PR submission triggers an AI Review Bot that analyzes code changes and comments on the PR. Human reviewers only look at AI’s comments.

After a year, we have enough data.

Issues AI Catches Well

1. Security Vulnerabilities (Extremely Effective)

AI is surprisingly strong at finding security issues.

# AI caught: SQL Injection
def get_user(request, user_id):
    query = f"SELECT * FROM users WHERE id = {user_id}"
    # AI comment: ⚠️ Directly concatenating user input into SQL, SQL injection risk
    # Suggestion: use parameterized query

AI catches: SQL injection, XSS, sensitive data leakage, hardcoded credentials. These issues are easy for humans to miss, but AI scanning is consistent.

2. Obvious Logic Errors

# AI caught: missing boundary condition
def calculate_discount(price, discount_percent):
    if discount_percent > 100:
        return 0  # this check exists
    
    # but missing: price < 0 case
    return price * (1 - discount_percent / 100)

# AI comment: ⚠️ price could be negative, not handled

3. Code Duplication and Bad Smells

# AI caught: this code is nearly identical to process_order above
# Suggestion: extract common function
def ship_order(order_id):
    # 97% duplicate with process_order
    pass

Issues AI Completely Misses

1. Business Logic Errors

This is the biggest blind spot.

AI doesn’t know your company’s business rules. It can only check if code “logically correct,” not if it “meets business requirements.”

# AI didn't catch this (but it's actually a bug)
def apply_coupon(order_total, coupon):
    if coupon.type == "percentage":
        return order_total * (1 - coupon.value / 100)
    # looks fine...
    return order_total

# But business rule is: coupon cannot exceed 50% of order_total
# AI doesn't know this business rule, so it missed it

2. Performance Issues (Most of the Time)

AI catches obvious N+1 queries, but complex performance problems often slip through.

# AI didn't flag performance issue (but one exists)
def get_user_orders(user_id):
    orders = db.query("SELECT * FROM orders WHERE user_id = ?", user_id)
    for order in orders:
        # each order queries user separately
        user = db.query("SELECT * FROM users WHERE id = ?", order.user_id)
        order.user = user
    return orders

# AI didn't catch this as N+1 (queries = 1 + N)

3. Edge Cases and Error Handling (Complex Scenarios)

Simple null checks AI catches, but complex error handling logic AI often misses.

One Year of Data

We tracked AI Review Bot findings vs human-confirmed accuracy:

Issue Type AI Detection Rate Human-Confirmed Valid
Security vulnerabilities 95% 92%
SQL/N+1 88% 85%
Null/edge cases 75% 70%
Business logic 12% 40%
Performance issues 45% 50%
Code duplication 80% 78%

Conclusion: AI is strong on security and basic code quality, weak on business logic and complex performance.

Actual Workflow

After PR submitted:
  1. AI Review Bot auto-analyzes diff
  2. Comments on PR (by priority)
     - 🔴 P1: Security vulnerability (block PR)
     - 🟡 P2: Logic/edge issues (suggest fix)
     - 🟢 P3: Code style (optional)
  3. Human reviewer only checks P1 and P2
  4. P3 suggestions: developer decides

Tool Selection

Tool Integration Notes
GitHub Copilot Review GitHub Actions Official, but limited features
Cursor Reviews PR comments Viewable in IDE
Meta AI Reviewer Self-built Customizable rules, most flexible
SonarQube AI CI/CD Old scanner + AI enhanced

We ended up using self-built with Claude Sonnet, writing custom rules to filter false positives.

Conclusion

AI Code Review’s value: frees human reviewers from 80% of trivial issues.

Let human reviewers focus on business logic and architectural decisions. AI handles security scanning and basic code quality.

Not AI replacing human reviewers—AI makes human reviewers more valuable.