LLM API Pricing Explained: Tokens & Costs

LLM API Pricing Explained: Input Tokens, Output Tokens, and What Drives Costs

Every major AI model API charges separately for what you send and what the model generates. Understanding this split is the most important step toward controlling your AI budget — because output tokens almost always cost more, often by a factor of 3–6×.

The Basic Formula

Cost per request = (input_tokens / 1,000,000) × input_price_per_M
                 + (output_tokens / 1,000,000) × output_price_per_M

Input tokens include your system prompt, conversation history, and the user's message. Output tokens are everything the model generates in its response. Both are billed per million (M) tokens.

Example: GPT-5.4 at $2.50 input / $15.00 output per 1M tokens. A request with 1,000 input tokens and 500 output tokens costs:

Input: (1,000 / 1,000,000) × $2.50 = $0.0025
Output: (500 / 1,000,000) × $15.00 = $0.0075
Total: $0.01 per request

At 1,000 requests/day: $300/month.

Why Output Tokens Cost More

Generating tokens requires the model to run a full forward pass for each token — it cannot be parallelized the same way reading input can. Input tokens are processed in a single parallel pass; output tokens are generated one at a time. This makes generation inherently more compute-intensive.

For most providers, output pricing is 3–6× higher than input pricing for equivalent models.

Pricing Tiers

Tier	Examples	Best for
Budget	GPT-5.4 mini, Claude Haiku 4.5	Chatbots, classification, extraction
Workhorse	GPT-5.4, Claude Sonnet 4.6, Gemini 3.5 Flash	Agentic workflows, coding, writing
Premium	GPT-5.5, Claude Opus 4.8	Complex reasoning, max quality

At 1,000 req/day with 1K input + 500 output tokens, monthly costs roughly scale:

Budget: ~$50–$105/month
Workhorse: ~$300–$560/month
Premium: ~$750–$1,500/month

What Actually Drives Your Bill

System prompt length. A 2,000-token system prompt adds $0.005 per request on a workhorse model. At 10,000 requests/day, that's an extra $1,500/month. Keep system prompts concise or use prompt caching.

Conversation history. Every prior message in a multi-turn chat is re-sent as input on each turn. A 10-turn conversation with 200 tokens per turn accumulates 2,000 extra input tokens by turn 10. Implement context trimming or summarization for long conversations.

Output verbosity. Models that respond verbosely cost more. Instructions like "respond concisely" or "limit to X words" directly reduce your output token bill.

Retrieval-Augmented Generation (RAG). Injecting retrieved documents into the prompt can add thousands of input tokens per request. Use semantic chunking and top-K retrieval limits to keep context lean.

Prompt Caching: The Hidden Discount

Anthropic and Google both offer prompt caching. If a large prefix (system prompt + static context) repeats across many requests, providers can cache it and charge 50–90% less for the cached portion.

For an application where 80% of the input is a static 4,000-token system prompt, caching can cut your effective input cost in half.

Choosing the Right Model

Start cheaper. Build and test on a budget model. Only upgrade after quality tests justify the cost difference.
Profile your token usage. Use the LLM Token Estimator to measure real prompt lengths before committing to a model tier.
Compare at your actual volume. A model 30% more expensive per token can still be the right choice if it eliminates failed requests or reduces retries. Use the LLM Cost Comparator to model your specific workload.
Revisit quarterly. Model prices drop regularly. What was cost-prohibitive six months ago may now be affordable.

Key Takeaways

Output tokens cost 3–6× more than input tokens — optimizing response length often has more impact than switching models.
System prompt and history are the biggest hidden cost drivers for conversational apps.
Prompt caching can halve input costs for apps with large static prefixes.
Budget-tier models are 5–20× cheaper than premium models — use the cheapest model that meets your quality bar.