AI Agent isn't just an LLM API call. A real Agent architecture has three components: Model, Harness, and Memory. This article dissects common Agent architecture patterns and tradeoffs in production environments.
Mid-2024, Cursor exploded, Windsurf entered, Copilot got major updates, Devin launched commercially. AI coding tools entered a战国 era. This article maps out each player's real differences and actual experience.
vLLM is the most popular open-source LLM inference engine today. Its PagedAttention technology delivers 24x throughput improvement on the same hardware. This article explains what vLLM is, how to deploy it, and practical considerations.
2024 H1: small open-source models exploded. Mistral 7B, Phi-3, Gemma dropped one after another—3B/7B models now match GPT-3.5 performance. This article analyzes why and what it means for real work.
Ollama makes running LLMs locally trivial. One command to start a model, 7B runs smoothly on Mac. This is a practical guide covering setup, real performance numbers, and when to use local vs API.