Model parameter controlling how random the output is during sampling. Temperature 0 makes the model pick the most probable next token at each step; higher values introduce more variation. Pin to 0 for reproducibility, raise for exploration.
Definition: Model parameter controlling how random the output is during sampling. Temperature 0 makes the model pick the most probable next token at each step; higher values introduce more variation. Pin to 0 for reproducibility, raise for exploration.
Temperature shapes the model's output distribution at sampling time. At temperature 0, the model deterministically picks the most probable next token; raise it and lower-probability tokens get more chance. Common defaults sit around 0.7 to 1.0, where the model is creative but coherent.
For research tasks where you want consistent outputs, set temperature to 0. Reproducibility, structured extraction, code generation, and audit-trail-required tasks all benefit. For exploration, brainstorming, or generating multiple candidate phrasings, raise it.
A common confusion: temperature 0 does not guarantee identical outputs across runs. Vendor model swaps, floating-point non-determinism on GPUs, and batching effects can still cause drift. Temperature 0 plus a pinned seed plus a pinned model version comes close but is not bit-for-bit deterministic.
This term is referenced in the following articles: