GPT vs Claude vs Gemini — API Pricing Compared

Choosing an AI model for your application involves balancing capability, latency, and cost. This article compares the API pricing for the major providers as of mid-2026.

Pricing Overview

All prices are per million tokens (input / output). Providers charge separately for what you send and what the model generates.

Model	Input (per 1M)	Output (per 1M)	Context
GPT-4o	$2.50	$10.00	128K
GPT-4o mini	$0.15	$0.60	128K
GPT-4 Turbo	$10.00	$30.00	128K
Claude Opus 4	$15.00	$75.00	200K
Claude Sonnet 4	$3.00	$15.00	200K
Claude 3.5 Haiku	$0.80	$4.00	200K
Gemini 1.5 Pro	$1.25	$5.00	1M
Gemini 1.5 Flash	$0.075	$0.30	1M

Prices are informational and subject to change. Always verify at the provider's official pricing page.

Budget Tier: Mini / Haiku / Flash

For high-volume workloads where cost is the primary constraint:

GPT-4o mini — The most widely used budget option. Strong general-purpose performance, especially for English tasks.
Claude 3.5 Haiku — Fast, cost-efficient, notably good at structured output and code.
Gemini 1.5 Flash — The cheapest input pricing of the three. Excellent for tasks that fit within a large context.

At 1,000 req/day with 1,000 input + 500 output tokens, monthly costs are roughly:

GPT-4o mini: ~$10
Gemini 1.5 Flash: ~$3.50
Claude 3.5 Haiku: ~$33

Mid-Tier: GPT-4o / Sonnet / Gemini 1.5 Pro

These are the workhorse models for production applications:

GPT-4o — Broad benchmark leader, multimodal (vision + text), strong code generation.
Claude Sonnet 4 — Longer context (200K vs 128K), nuanced writing, strong instruction following.
Gemini 1.5 Pro — 1 million token context window; useful for long document analysis.

GPT-4o and Claude Sonnet 4 are priced similarly; Gemini 1.5 Pro is about half the cost of both.

Premium Tier: GPT-4 Turbo / Claude Opus

GPT-4 Turbo — Legacy premium option. Largely superseded by GPT-4o for most tasks at lower cost.
Claude Opus 4 — Anthropic's most capable model. At $15 input / $75 output, it is appropriate only for tasks where maximum quality justifies the cost.

Key Differences Beyond Price

Context window — Gemini 1.5 Pro (1M tokens) and Claude models (200K) handle much longer conversations or documents than GPT-4o (128K) without truncation.

Output length limits — Maximum output per request varies. Check provider documentation for your specific use case.

Caching — Anthropic and Google offer prompt caching discounts for repeated context. OpenAI offers automatic caching for certain prompts. For applications with a large fixed system prompt, this can halve effective input costs.

Rate limits — Free tier and early-access tier rate limits differ significantly. Factor in rate limits if building for burst traffic.

How to Choose

Start cheap, upgrade if needed. Test GPT-4o mini or Gemini 1.5 Flash first. Only upgrade to a premium model if quality testing reveals a meaningful gap for your specific task.
Long documents → Gemini 1.5 Pro or Claude. If you need to process books, legal contracts, or large codebases in one call, the 200K–1M context window is decisive.
Structured output → Claude Haiku or GPT-4o mini. Both are reliable for JSON extraction, classification, and entity recognition.
Complex reasoning → GPT-4o or Claude Sonnet 4. For math, multi-step logic, or nuanced writing, the mid-tier models are usually sufficient.