Utilix knowledge base
What Is an LLM Token? How AI Models Count Text
Published May 3, 2026
When you send a message to an AI model, the model does not see words or characters — it sees tokens. A token is the basic unit of text that large language models (LLMs) process. Understanding tokens helps you predict costs, avoid truncation, and write more efficient prompts.
Why Tokens, Not Words?
AI models are trained on a tokenized version of text. The tokenizer — the component that splits text into tokens — learns which sequences of characters appear together frequently. Common words become single tokens; rare words or specialized terms get split into multiple tokens.
This approach lets the model handle any language and vocabulary, including code, URLs, and made-up words, without needing a finite dictionary.
How Many Characters Is a Token?
For typical English prose, one token equals roughly 4 characters or 0.75 words. Some common rules of thumb:
| Content type | Approximate ratio |
|---|---|
| English prose | 1 token ≈ 4 characters or 0.75 words |
| 1,000 words | ≈ 1,300–1,400 tokens |
| 1 page of text (~500 words) | ≈ 650–700 tokens |
| Source code | 1 token ≈ 3–5 characters (shorter, denser) |
| Non-English languages | Often more tokens per word than English |
These are approximations. The exact count depends on the model's tokenizer.
Token Boundaries
The word "hamburger" typically becomes 3 tokens: ham, bur, ger. The word "the" is almost always 1 token. Spaces, punctuation, and capitalization all affect token boundaries.
Common patterns:
- Short, common words: 1 token each
- Long or technical words: 2–4 tokens
- Numbers: usually 1 token per digit
- URLs: many tokens, often 1 per slash or hyphen
Input Tokens vs Output Tokens
AI API pricing separates input and output tokens:
- Input tokens — everything you send to the model: system prompt, conversation history, and the user message
- Output tokens — everything the model generates in its response
Output tokens typically cost 3–5× more than input tokens because generating text is more compute-intensive than reading it.
Context Window
Every model has a context window — the maximum total tokens it can process in a single request (input + output combined). Common limits:
| Model | Context window |
|---|---|
| GPT-4o | 128,000 tokens |
| Claude Sonnet 4 | 200,000 tokens |
| Gemini 1.5 Pro | 1,000,000 tokens |
If your conversation history plus your prompt exceeds the context limit, older messages are dropped or the request fails.
Why This Matters for Costs
API pricing is charged per 1,000 or per 1,000,000 tokens. A 100-word prompt costs roughly 130 input tokens. At GPT-4o's rate of $2.50 per million input tokens, that is $0.000325 — less than a fraction of a cent. But at scale (10,000 requests per day), token counts multiply quickly.
Understanding token counts helps you:
- Budget API usage before building a product
- Decide whether to use a cheap model (GPT-4o mini) or a premium one (GPT-4o)
- Trim system prompts to reduce per-request cost
- Know when you are approaching a context limit