LLM token cost calculator
Last reviewed May 28, 2026 · SoftwareEstimator.com
To calculate LLM token cost, multiply input tokens by the model’s input rate and output tokens by its output rate, per million. At 2026 rates: Claude Opus 4.8 is $5 / $25 per million input/output (cached reads $0.50); Sonnet 4.6 is $3 / $15; GPT-5.4 is $2.50 / $15; Gemini 3.1 Pro is $2 / $12; and Gemini 2.5 Flash is $0.30 / $2.50. For a single chat that maths out trivially. For an agentic build it does not — cache reads make up over 97% of the tokens, so the real bill is driven by how many times the agent re-reads context, not by the code it writes.
2026 per-million token rates
Input / output per 1M tokens, with cached-read rates:
- → Claude Opus 4.8 — $5 / $25 (cache $0.50)
- → Claude Sonnet 4.6 — $3 / $15 (cache $0.30)
- → Claude Haiku 4.5 — $1 / $5 (cache $0.10)
- → OpenAI GPT-5.4 — $2.50 / $15 (cache $0.25)
- → Gemini 3.1 Pro — $2 / $12 (cache $0.20)
- → Gemini 2.5 Flash — $0.30 / $2.50 (cheapest viable agent tier)
Why pasting a string isn’t enough for a build
Standard token calculators price a block of text you already have. They can’t tell you what a whole project will cost, because an agent re-ingests the codebase, tool definitions, and its own context on every turn — and that re-reading, not the output, is the bill.
Note also that Claude Opus 4.7+ use a new tokenizer that consumes up to ~35% more tokens for the same text, so older token counts undercount on the latest models. To price a project rather than a paragraph, estimate the scope.
Frequently asked questions
How do I calculate the cost of an API call?
(input tokens × input rate + output tokens × output rate) ÷ 1,000,000, using the model’s per-million prices. Cached input tokens are billed at roughly 0.1× the input rate.
What is the cheapest LLM per token?
Among mainstream models, Gemini 2.5 Flash-Lite and Flash are the cheapest ($0.10–0.30 input per million); Claude Haiku 4.5 is the cheapest Claude tier at $1/$5.
Why is my agentic coding bill higher than a token calculator predicts?
Because cache reads are over 97% of agentic token volume. A calculator prices the text you paste; an agent re-reads the whole context on every turn, which dwarfs the visible output.
Related guides
Figures are industry-composite estimates for planning, not quotes — agentic token spend has 10×+ run-to-run variance. See the full methodology or run an estimate .