Temperature and Top P
Temperature and Top P are the two knobs that control how random or deterministic an LLM's output is — and knowing when to use each one matters.
LLMs are nondeterministic by default. But if you're building AI applications, two parameters give you real control over how predictable the output is: Temperature and Top P.
You won't see these in a chat UI like Claude.ai or ChatGPT. They're configured at the API level. But understanding them makes you a better prompt engineer — both for building AI apps and for understanding why a model behaves the way it does.
Temperature
Temperature controls how often the model picks the next most likely token.
Scale: 0.0 to 2.0. Default is usually around 1.0.
| Temperature | Behavior |
|---|---|
0 | Always picks the most likely token. Essentially deterministic. |
0.5–0.7 | Mostly predictable, occasional variation. Good for factual tasks. |
1.0 | Default. Balanced creativity and coherence. |
1.3–1.5 | More creative, more varied. Good for brainstorming. |
2.0 | Chaotic. Output becomes incoherent. Avoid. |
When to use low temperature
Anytime accuracy matters over creativity:
- Code generation — you want the correct syntax, not a "creative" interpretation
- Data extraction — parsing emails, error messages, structured data
- Healthcare or finance AI apps — the model should never guess
- Any task with a single correct answer
// API usage example (pseudo-code)
const response = await anthropic.messages.create({
model: "claude-sonnet-4-6",
temperature: 0.2, // near-deterministic for factual extraction
messages: [{ role: "user", content: "Extract the date and amount from this invoice..." }]
});When to use high temperature
When variety is the goal:
- Creative writing — novels, marketing copy, brainstorming
- Support chatbots — a more natural, conversational tone
- Generating multiple options — give me 5 different approaches to this architecture
Setting temperature to 2.0 doesn't make your model smarter or more creative — it makes output incoherent. Stay well below that ceiling.
Top P (Nucleus Sampling)
Top P is an alternative method of controlling randomness. Instead of adjusting how likely the model is to pick any token, it removes low-probability tokens from consideration entirely.
Scale: 0.0 to 1.0. Default is 1.0 (consider all tokens).
Think of it like this: if the token distribution for "what color is the sky?" looks like this:
| Token | Probability |
|---|---|
| blue | 75% |
| gray | 20% |
| orange | 5% |
With Top P = 0.5, only tokens within the top 50% of probability mass are considered. That cuts out gray (20%) and orange (5%) entirely — you'll only ever get "blue."
With Top P = 1.0 (default), all options stay in play.
Temperature vs Top P
They solve slightly different problems:
- Temperature — scales the probability distribution up or down (more or less spread)
- Top P — truncates the bottom of the distribution (hard cutoff on unlikely tokens)
You can use them together. A common pattern for serious business applications:
temperature: 0.8 // some creativity allowed
top_p: 0.5 // but never the bottom 50% of unlikely tokensThis gives you a model that sounds natural but won't go wildly off-script.
The Practical Takeaway
If you're just prompting in chat — you don't control these. The provider sets them. Claude's chat defaults are tuned to feel helpful and natural.
Where this matters:
- Building AI apps: Adjust temperature down for factual/code tasks, up for creative ones
- Debugging unexpected output: A model giving you weird, off-topic responses might have a high temperature set somewhere upstream
- Understanding why the same prompt gives different results: It's not the model "being broken" — it's sampling at work
The goal isn't to make LLMs deterministic (you can't, fully). The goal is to make them predictable enough for the task at hand.
Enjoyed this? Get more like it.
Deep dives on system design, React, web development, and personal finance — straight to your inbox. Free, always.