The AI Dev Tools Landscape
Cursor, Claude Code, Codex, Gemini CLI — what each tool is for, how they differ, and why "which one should I use?" has a different answer depending on what you're doing.
The instinct is to pick one tool and stick with it. The reality is that Cursor and Claude Code solve different problems — and knowing which to reach for, and when, is the real skill.
The Vibe Coding Problem
There's a real phenomenon that happens with AI coding tools:
"The code grows beyond my usual comprehension."
That line captures the failure mode perfectly. You're in a flow state. The agent is generating code. Things feel productive. Then you surface and realize you don't fully understand what was built — and neither does the codebase now.
AI tools are very good at generating a lot of code quickly. But a lot of code isn't always a good thing. Anyone who has parachuted into a legacy codebase knows: more code is usually more problems. The goal of today's tools should be generating the right code, not the most code.
The good news: decades-old best practices — git discipline, test-driven development, architectural decision records, small focused commits — turn out to work extremely well as guardrails for AI tools. The discipline you've always aspired to follow now pays compounding dividends.
The Two Main Tools
Cursor
Cursor is a VS Code fork. If it looks a lot like VS Code, that's because it is — effectively a superset of VS Code with AI features built in.
The experience: you're in your editor. You can see your code and the AI's changes side by side. You accept or reject inline. The agent mode works from inside the same environment you're already in.
Best for:
- Staring at a function you need to refactor
- Inline surgical edits to specific files
- Tasks where you want to watch what's happening and stop it if needed
- Working with the code directly visible
Pricing: Free 14-day pro trial, then a hobby plan with limits. Pro plan covers everything in this series. API keys can be provided directly if you want to bypass Cursor's billing.
Claude Code
Claude Code is a terminal app. It runs in your terminal — you're not necessarily looking at the same code the agent is touching.
The experience: more autonomous. You give it a task, it works, you review the result. You're not tied to what file you're looking at while it runs.
Best for:
- Changing function signatures across an entire codebase
- Larger refactors that span many files
- Tasks where you want to delegate and review the outcome
- Running in the background while you do other work
Pricing: No free plan. Pay via API usage or monthly subscription. Rate-limited even on max plans.
Alternative if cost is a concern: Google's Gemini CLI is free within limits if you have a Google account, and covers the terminal-based paradigm well for learning.
The "Which One?" Answer
Staring at a specific function → Cursor inline edit
Refactoring one component → Cursor agent
Changing a function signature across the entire codebase → Claude Code
Large background task while you do something else → Claude Code
Brainstorming architecture → Gemini (large context window) or GPT O3
Code review on a PR → GitHub Copilot / CodexThe real answer is: use both, for different things. They're not competitors — they're tools with different interaction paradigms.
OpenAI Codex: The Third Paradigm
Codex takes a completely different approach. You point it at a git repo, tell it what you want, review its plan, and it opens a PR when done.
This is the traditional code reviewer paradigm — you're reviewing a pull request, not watching code get written. Useful for:
- Kicking off work remotely (away from your computer)
- Tasks you want treated like a PR you'll review in the morning
- Understanding a large codebase without running it locally
Model Selection
Every tool gives you a model picker. The choice matters more than people think.
| Model Family | Best For |
|---|---|
| Claude Sonnet (Anthropic) | Daily coding driver — best balance of quality and speed |
| Claude Opus | Complex tasks, costs 4–6x more |
| Claude Haiku | Fast, cheap, great for simple/repetitive tasks |
| Gemini Flash (Google) | Speed, large context (2M tokens), brainstorming |
| Gemini Pro | Thinking mode, understanding whole repos |
| GPT O3 (OpenAI) | Brainstorming, "what am I missing?", factual reasoning |
Key insight: a small model with a well-detailed plan outperforms a large model with a vague request. This is the central theme of working effectively with these tools. Opus let loose on an underspecified task will produce worse results than Haiku given a precise, scoped specification.
For budget-conscious use: Sonnet hits the sweet spot of affordable and capable. Save the big models for tasks that genuinely need them.
The Systems Thinking Gap
AI tools generate code quickly. But generating code has always been the less valuable part of engineering. As you move up the career ladder, the code-writing decreases and the systems thinking increases — and the pay goes up.
The interesting opportunity: you can now generate a large, messy codebase on purpose and practice navigating it. You can experience the architectural challenges that normally only come with years in production systems. That's genuinely useful for building the judgment that AI tools can't replace.
What AI tools won't do for you:
- Decide what to build
- Catch their own architectural mistakes
- Know your team's conventions unless you tell them
- Review their own output critically
What they do extremely well:
- Execute a well-specified plan
- Handle the mechanical parts of refactoring
- Generate boilerplate
- Surface possibilities you hadn't considered
The engineers getting the most leverage are the ones who bring the systems thinking and delegate the execution — not the ones who let the agent decide both.
Keep reading
Enjoyed this? Get more like it.
Deep dives on system design, React, web development, and personal finance — straight to your inbox. Free, always.