Temperature and Top P

LLMs are nondeterministic by default. But if you're building AI applications, two parameters give you real control over how predictable the output is: Temperature and Top P.

You won't see these in a chat UI like Claude.ai or ChatGPT. They're configured at the API level. But understanding them makes you a better prompt engineer -- both for building AI apps and for understanding why a model behaves the way it does.

Temperature

ExpandTemperature and Top P -- how they control the probability distribution

Temperature controls how often the model picks the next most likely token.

Scale: 0.0 to 2.0. Default is usually around 1.0.

Temperature	Behavior
`0`	Always picks the most likely token. Essentially deterministic.
`0.5–0.7`	Mostly predictable, occasional variation. Good for factual tasks.
`1.0`	Default. Balanced creativity and coherence.
`1.3–1.5`	More creative, more varied. Good for brainstorming.
`2.0`	Chaotic. Output becomes incoherent. Avoid.

When to use low temperature

Anytime accuracy matters over creativity:

Code generation -- you want the correct syntax, not a "creative" interpretation
Data extraction -- parsing emails, error messages, structured data
Healthcare or finance AI apps -- the model should never guess
Any task with a single correct answer

JavaScript

// API usage example (pseudo-code)
const response = await anthropic.messages.create({
  model: "claude-sonnet-4-6",
  temperature: 0.2,  // near-deterministic for factual extraction
  messages: [{ role: "user", content: "Extract the date and amount from this invoice..." }]
});

When to use high temperature

When variety is the goal:

Creative writing -- novels, marketing copy, brainstorming
Support chatbots -- a more natural, conversational tone
Generating multiple options -- give me 5 different approaches to this architecture

Setting temperature to 2.0 doesn't make your model smarter or more creative -- it makes output incoherent. Stay well below that ceiling.

Top P (Nucleus Sampling)

Top P is an alternative method of controlling randomness. Instead of adjusting how likely the model is to pick any token, it removes low-probability tokens from consideration entirely.

Scale: 0.0 to 1.0. Default is 1.0 (consider all tokens).

Think of it like this: if the token distribution for "what color is the sky?" looks like this:

Token	Probability
blue	75%
gray	20%
orange	5%

With Top P = 0.5, only tokens within the top 50% of probability mass are considered. That cuts out gray (20%) and orange (5%) entirely -- you'll only ever get "blue."

With Top P = 1.0 (default), all options stay in play.

Temperature vs Top P

They solve slightly different problems:

Temperature: scales the probability distribution up or down (more or less spread)
Top P: truncates the bottom of the distribution (hard cutoff on unlikely tokens)

You can use them together. A common pattern for serious business applications:

YAML

temperature: 0.8   # some creativity allowed
top_p: 0.5         # but never the bottom 50% of unlikely tokens

This gives you a model that sounds natural but won't go wildly off-script.

The Practical Takeaway

If you're just prompting in chat -- you don't control these. The provider sets them. Claude's chat defaults are tuned to feel helpful and natural.

Where this matters:

Building AI apps: Adjust temperature down for factual/code tasks, up for creative ones
Debugging unexpected output: A model giving you weird, off-topic responses might have a high temperature set somewhere upstream
Understanding why the same prompt gives different results: It's not the model "being broken" -- it's sampling at work

The goal isn't to make LLMs deterministic (you can't, fully). The goal is to make them predictable enough for the task at hand.

Practice

0/5 done