A model class that spends extra compute thinking before answering, generating intermediate reasoning steps. Slower and more expensive per response; materially better on hard problems.
Definition: A model class that spends extra compute thinking before answering, generating intermediate reasoning steps. Slower and more expensive per response; materially better on hard problems.
A model class that spends extra compute thinking before answering, generating intermediate steps that improve performance on math, code, and complex analysis. Slower, more expensive per response, materially better on hard problems. Evaluate them differently than fast chat models: latency and cost shift, and the wins only appear when the task is actually hard.
An AI system trained on vast amounts of text to predict and generate human-like language. Best understood as a concept-transformation engine rather than a knowledge database.
Running a trained model to produce output, as opposed to training it. Every API call to a language model is inference, and inference cost per token is what shapes LLM economics.
A trained system that maps input to output through learned patterns. In research practice usually a large language model, but the term also covers image, audio, and multi-modal systems.