The unit a language model processes input and produces output in. Roughly four characters of English on average, less for code and non-Latin scripts.
Definition: The unit a language model processes input and produces output in. Roughly four characters of English on average, less for code and non-Latin scripts.
The unit a model sees and produces. Roughly four characters of English, less for code or non-Latin scripts. Cost, rate limits, and context window are all measured in tokens, not words. Your prose is denser than you think.
The total number of tokens a model accepts in a single request, counting input and output together. Larger windows raise cost and latency, and quality often degrades toward the far end.
Running a trained model to produce output, as opposed to training it. Every API call to a language model is inference, and inference cost per token is what shapes LLM economics.
The text input you send to a language model. Most 'AI doesn't work' complaints trace back to prompt quality before they trace to model quality.