What Is an LLM Token? How AI Models Count Text

When you send a message to an AI model, the model does not see words or characters — it sees tokens. A token is the basic unit of text that large language models (LLMs) process. Understanding tokens helps you predict costs, avoid truncation, and write more efficient prompts.

Why Tokens, Not Words?

AI models are trained on a tokenized version of text. The tokenizer — the component that splits text into tokens — learns which sequences of characters appear together frequently. Common words become single tokens; rare words or specialized terms get split into multiple tokens.

This approach lets the model handle any language and vocabulary, including code, URLs, and made-up words, without needing a finite dictionary.

How Many Characters Is a Token?

For typical English prose, one token equals roughly 4 characters or 0.75 words. Some common rules of thumb:

Content type	Approximate ratio
English prose	1 token ≈ 4 characters or 0.75 words
1,000 words	≈ 1,300–1,400 tokens
1 page of text (~500 words)	≈ 650–700 tokens
Source code	1 token ≈ 3–5 characters (shorter, denser)
Non-English languages	Often more tokens per word than English

These are approximations. The exact count depends on the model's tokenizer.

Token Boundaries

The word "hamburger" typically becomes 3 tokens: ham, bur, ger. The word "the" is almost always 1 token. Spaces, punctuation, and capitalization all affect token boundaries.

Common patterns:

Short, common words: 1 token each
Long or technical words: 2–4 tokens
Numbers: usually 1 token per digit
URLs: many tokens, often 1 per slash or hyphen

Input Tokens vs Output Tokens

AI API pricing separates input and output tokens:

Input tokens — everything you send to the model: system prompt, conversation history, and the user message
Output tokens — everything the model generates in its response

Output tokens typically cost 3–5× more than input tokens because generating text is more compute-intensive than reading it.

Context Window

Every model has a context window — the maximum total tokens it can process in a single request (input + output combined). Common limits:

Model	Context window
GPT-4o	128,000 tokens
Claude Sonnet 4	200,000 tokens
Gemini 1.5 Pro	1,000,000 tokens

If your conversation history plus your prompt exceeds the context limit, older messages are dropped or the request fails.

Why This Matters for Costs

API pricing is charged per 1,000 or per 1,000,000 tokens. A 100-word prompt costs roughly 130 input tokens. At GPT-4o's rate of $2.50 per million input tokens, that is $0.000325 — less than a fraction of a cent. But at scale (10,000 requests per day), token counts multiply quickly.

Understanding token counts helps you:

Budget API usage before building a product
Decide whether to use a cheap model (GPT-4o mini) or a premium one (GPT-4o)
Trim system prompts to reduce per-request cost
Know when you are approaching a context limit