Running a trained model to produce output, as opposed to training it. Every API call to a language model is inference, and inference cost per token is what shapes LLM economics.
Definition: Running a trained model to produce output, as opposed to training it. Every API call to a language model is inference, and inference cost per token is what shapes LLM economics.
The act of running a model to produce output, as opposed to training it. Every API call you make is inference. Inference is what you pay for. Inference cost per token, plus rate limits and throttling, is what makes 2026 LLM economics complicated.
The unit a language model processes input and produces output in. Roughly four characters of English on average, less for code and non-Latin scripts.
The structured way one piece of software talks to another. API access to a foundational model gives you control over prompts, parameters, version pinning, and data flow that no chat interface offers.
An AI system trained on vast amounts of text to predict and generate human-like language. Best understood as a concept-transformation engine rather than a knowledge database.