Context Window Management

Managing what's in the context window is the single biggest lever for AI coding quality. Here's how to think about it and what to do about it.

March 30, 20265 min read2 / 2

"Weeks of coding can save you hours of planning."

That sticker was on a manager's iPad in 2017. It was about humans. It turns out to be even more true for AI coding tools.

What the Context Window Actually Is

Every AI tool — Cursor, Claude Code, Gemini — has a context window. It's the working memory of the model: everything it can see at once.

When you're in a conversation with any of these tools, you're not just sending your most recent message. You're sending the entire conversation history every time, as if the model had never seen any of it before. There are no sessions, no cookies, no memory between requests — just a growing string of text sent fresh with each call.

A token is roughly 4 characters. Gemini's context window is around 2 million tokens — that's about 8 million characters, or several full novels. But:

  1. Bigger isn't always better. Even with a massive context window, having too much stuff in context degrades quality. The model loses the thread. It's like being a human trying to hold too many things in your head at once.

  2. The model you use determines the window. Smaller, cheaper models have smaller windows. Larger models with bigger windows are slower and more expensive.

  3. Your entire chat history is filling that window. Every exchange, every file you've attached, every output the model produced — it's all accumulating.

The Central Skill: Right-Sizing Context

The question isn't "how do I get the most context in?" It's "how do I get the right context in?"

Give it the relevant files, not all files

Plain text
❌ @codebase — now you've potentially burned most of your context window and the model is reading deprecated legacy code to pattern match from ✅ @src/components/UserCard.tsx @src/api/users.ts — the exact two files related to this specific task

When Cursor indexes your codebase, it uses the first 250 lines of any file (100 lines when searching). If your files are 1,000 lines long, the model is guessing about most of the file. Smaller, focused files aren't just good code style — they're better for AI tools too.

Be specific about what you want

Plain text
❌ "Do auth." ✅ "Use Auth0's JavaScript SDK to implement a React hook that checks for user authentication and connects to Google OAuth. The hook should return { user, isLoading, error } and handle the redirect callback."

The more specific the request, the less the model has to guess — and the less it guesses, the less it goes off in unexpected directions.

Keep tasks small and sequential

Plain text
❌ "Build me an enterprise SaaS app." ✅ Step 1: "Build a drag-and-drop file upload widget in the UI." Step 2: "When the user hits submit, upload the file to this S3 bucket using this SDK." Step 3: "Add a progress indicator during upload."

Each step can be verified before the next begins. Mistakes are caught early and cheap to fix. The model doesn't need to hold the entire plan in memory — you're managing that.

Plan → Execute → Verify

There are two distinct mental modes when working with AI tools:

  • Brainstorming/planning — exploring possibilities, questioning assumptions, getting a sense of the space
  • Executing — implementing a specific, agreed-upon plan

These should not happen simultaneously. Trying to figure out what you want to build while building it is how you end up with code that grew beyond your comprehension.

The workflow that works:

Plain text
1. PLAN → use a capable model (O3, Gemini Pro) to explore options, question the architecture, poke holes in the idea. Write the plan to a markdown file. 2. EXECUTE → use a faster model (Sonnet, Flash) with the written plan as context. Implement one piece at a time. 3. VERIFY → review the output. Does it match the plan? Does it make sense? Trim dead code. Commit what's good. Then move to the next step.

A practical workflow some engineers use: generate the plan with ChatGPT O3 → bring it to Gemini to poke holes → bring the refined plan to Claude to implement. The brainstorming models and the execution models aren't always the same models.

The Multi-Tool Brainstorming Strategy

Plain text
Step 1: "Here's what I want to build. Give me a plan." (O3 or Gemini) Step 2: "Here's the plan. What could go wrong? What am I missing?" (different model) Step 3: "Here's the refined plan. Implement step 1." (Sonnet / Claude Code)

You are always the arbiter. Reading the plan matters. If you take a generated plan without reading it and hand it to an agent, you've abdicated all judgment — and whatever comes out is on you.

When Things Go Off the Rails

Some signals that context has become a problem:

The bug loop: you ask it to fix a bug, it says it did, the bug is still there, repeat. This is often because the model has lost context about what the fix was supposed to do. Start a new chat.

Unexpected files appearing: v2.js, simplified.ts, the original still sitting there. The model doesn't clean up after itself — you do. Audit after every significant session.

Instructions getting ignored: you said "don't add extra features" and it added features. Either the instruction dropped out of context, or your CLAUDE.md says one thing and your Cursor rules say another. Check for conflicts in your config files.

The model changed behavior mid-session: long enough chat and the early context has dropped off. Critical constraints at the start of a chat need to be re-stated as a conversation grows.

Practical Habits

  • Clear the chat between distinct tasks. Don't let a session about auth carry over into a session about UI components.
  • Commit often. If the agent does something good, commit it before asking for more. Rollback becomes trivial.
  • Watch, don't queue. Stacking up multiple follow-up requests before reviewing the current one is how you end up with cascading mistakes you don't catch until it's expensive.
  • Audit at end of day. Look at what the agent actually changed. Delete dead code. Remove the files it started and abandoned. Keep your house in order.

The name of the game — the single biggest lever for quality — is managing the context window: what's in it, how much of it, and making sure it has the right things and not the wrong ones.

Enjoyed this? Get more like it.

Deep dives on system design, React, web development, and personal finance — straight to your inbox. Free, always.