Context Placement

It's not just what you say in a prompt -- it's where you say it.

Research shows that critical information placed in the middle of a long context gets worse model attention than information at the beginning or end. In some cases, middle-buried information performs worse than providing no information at all.

The "Lost in the Middle" Study

The paper "Lost in the Middle: How Language Models Use Long Contexts" tested model performance across many documents, with the key information placed at different positions: early documents, middle documents, late documents.

Findings:

Beginning of context → highest accuracy
End of context → high accuracy
Middle of context → lower accuracy than the baseline (no documents provided)

This is striking. The model literally did better with no information than with the relevant information buried in position 7–16 of a 20-document context.

Why This Happens: Primacy and Recency Bias

Human psychology has names for this:

Primacy bias -- we remember the beginning of a list better than the middle. Recency bias -- we remember the end of a list better than the middle.

LLMs, trained on human-generated data and built with attention mechanisms that mirror neural networks, exhibit the same pattern. The model attends to the beginning and end of a context more strongly than its middle.

This means long conversations and big context windows don't give you unlimited reliable attention. They give you a buffer -- but not an even one.

Practical Implications

1. Critical instructions belong at the start

Plain text

❌  "...and by the way, don't add any features I didn't ask for."
     [buried at line 12 of a 20-line prompt]

✅  "IMPORTANT: Do not add any features beyond what is specified below.
     [then the full requirements]"

2. Supporting details go at the end

Put context that provides depth or background after your core request. The model reads the main task first (primacy), sees the supporting details (recency if it's the last thing), and forms its plan.

3. Long conversations degrade silently

What you put at the end of message 5 becomes the "middle" of your conversation by message 20. If something is important, re-state it -- don't assume it's still in view.

Plain text

// After 15+ messages of code generation:
"Reminder: all functions should be pure -- no side effects.
Continue building the auth system with this constraint in mind."

4. The "start a new chat" signal

If you're getting worse outputs as a conversation grows -- code that's drifting from your stated conventions, features being added that you explicitly said not to -- context has likely been lost or buried.

Signal: start a new chat. Ask the model to summarize first:

Plain text

"Before we continue, summarize what we've built so far, the key constraints
we've agreed on, and what the next feature is. I'll paste this into a new chat."

Context Window vs. Useful Attention Window

A model might have a 200,000-token context window. That doesn't mean it reliably attends to all 200,000 tokens. The useful attention window -- where the model actually processes information well -- is smaller, and it's weighted toward the beginning and end.

The "Lost in the Middle" study found this degradation starting around 2,000–4,000 tokens in. In a tool like Copilot or Cursor with a system message, attached files, and conversation history, you can hit that zone faster than you'd expect.

Structure That Fights "Lost in the Middle"

When writing long prompts, repeat critical constraints in two places:

Plain text

[TOP]
You are building a Prompt Library app.
CONSTRAINTS:
- Plain HTML, CSS, JavaScript only
- No external dependencies
- No features beyond what is listed

[... middle: detailed specifications ...]

[BOTTOM]
Remember: use only plain HTML, CSS, and JavaScript.
Implement only the features listed above -- nothing else.

This exploits both primacy and recency bias to keep the most important constraints visible.

When Context is Too Much

Given the "Lost in the Middle" finding, sometimes less context is better.

If you're working in a large codebase:

Don't paste the entire repo into context
Don't @codebase if you only need 2 files
Provide the minimal files needed for the specific task

Plain text

❌  Attaching 15 files "for context"
    → Key constraints in file #8 effectively don't exist

✅  Attaching only the 2 files directly relevant to the current task
    → Everything the model sees is high-signal

Summary

Position	Attention level	Use for
Beginning	Highest	Critical constraints, core task description
End	High	Supporting details, reminders, output format
Middle	Lowest	Avoid putting anything critical here

When a conversation gets long, re-state critical information. When quality degrades, start fresh. The model's context window is large, but its useful attention window is not.

Practice

0/5 done

Context Placement

The "Lost in the Middle" Study

Why This Happens: Primacy and Recency Bias

Practical Implications

1. Critical instructions belong at the start

2. Supporting details go at the end

3. Long conversations degrade silently

4. The "start a new chat" signal

Context Window vs. Useful Attention Window

Structure That Fights "Lost in the Middle"

When Context is Too Much

Summary

Further Reading and Watching

Practice