Published June 10, 2026
Claude API pricing in 2026 runs from $1 input / $5 output per million tokens (Haiku 4.5) up to $10 / $50 for the new frontier model Fable 5, with Opus 4.8 at $5 / $25 and Sonnet 4.6 at $3 / $15. Prompt caching cuts repeated input to a tenth of the rate, and the Batch API is 50% off. This is the full current Claude API price list, the discounts that actually move your bill, and how to estimate cost before you ship.
For free ways onto the API see Claude Free Credits 2026; for the coding tool's pricing see Claude Code Pricing 2026.
How much does the Claude API cost?
Per-million-token rates, verified 2026-06-10 against Anthropic's pricing page:
| Model | Input /MTok | Output /MTok | Best for |
|---|---|---|---|
| Claude Haiku 4.5 | $1 | $5 | Fast, cheap, high-volume tasks |
| Claude Sonnet 4.6 | $3 | $15 | Balanced default for most apps |
| Claude Opus 4.8 | $5 | $25 | Complex reasoning and coding |
| Claude Fable 5 | $10 | $50 | Frontier, hardest long-horizon work |
Fable 5 is Anthropic's new top model (see Claude Fable 5); Opus 4.8 remains the recommended default for most complex work at half the price. Haiku 4.5 and Sonnet 4.6 cover the bulk of production traffic far cheaper.
Prompt caching and batch: the two discounts that matter
The sticker rates above are the on-demand price. Two built-in levers cut real cost:
- Prompt caching. Cache reads are billed at 0.1x the input rate, so a stable system prompt or RAG context reused across calls drops to a tenth of the price. Writing to the cache costs 1.25x (5-minute) or 2x (1-hour) the input rate, paid once.
- Batch API: 50% off. Submit non-urgent requests as a batch with up to a 24-hour window and pay half the on-demand input and output rates. Ideal for evals, bulk classification, embeddings-style jobs, and overnight processing.
The two stack. A high-cache-hit batch workload can land well under half the sticker price.
The tokenizer caveat most price lists skip
Opus 4.7 and newer models (including Opus 4.8 and Fable 5) use a tokenizer that can produce up to ~35% more tokens for the same text than older Claude models. Your effective bill per page of input is higher than the per-token rate alone suggests, so when you compare Claude to another provider, compare on the same real text, not just the headline per-token number.
How to estimate your Claude API bill
A quick model:
- Estimate input and output tokens per request (1,000 tokens is roughly 750 words; add ~35% for the newer-tokenizer models).
- Multiply by the model's input and output rates above.
- Subtract caching: if a large fixed prefix repeats, only the first call pays full input price, the rest pay 0.1x on that prefix.
- If the workload is non-interactive, halve it with the Batch API.
For a Sonnet 4.6 chat app sending ~2k input + ~500 output tokens per turn, that is about $0.006 + $0.0075 = ~$0.0135 per turn before caching, which caching on the system prompt typically cuts by a third or more.
Claude API pricing vs OpenAI and Gemini
Claude is not the cheapest option, it is priced as a quality tier. For cost-sensitive workloads, Google Gemini has a permanent free API tier and Groq and Cerebras run open-weight models far cheaper (see Free LLM APIs in 2026). The practical pattern most teams use: route everyday traffic through cheaper or free models and reserve Claude (Opus 4.8 or Fable 5) for the queries where its reasoning and tool-use reliability clearly win.
Frequently asked questions
How much does the Claude API cost in 2026? Per million tokens: Claude Haiku 4.5 is $1 input / $5 output, Sonnet 4.6 is $3 / $15, Opus 4.8 is $5 / $25, and the new Fable 5 frontier model is $10 / $50. Prompt caching bills cache reads at a tenth of the input rate, and the Batch API is 50% off.
Which Claude model is cheapest? Claude Haiku 4.5 at $1 input / $5 output per million tokens is the cheapest, good for fast high-volume tasks. Sonnet 4.6 ($3 / $15) is the balanced default. Opus 4.8 ($5 / $25) and Fable 5 ($10 / $50) cost more but handle the hardest reasoning and coding.
Is there a free Claude API tier? There is no permanent free Claude API tier, but new accounts get a small credit and Anthropic runs startup and open-source credit programs. See Claude Free Credits 2026 for every route. Google Gemini, Groq, and Cerebras offer genuinely free API tiers if you need $0 inference.
How does prompt caching reduce Claude API cost? Cache reads are billed at 0.1x the normal input rate. If your requests share a large stable prefix (a system prompt, instructions, or RAG context), caching it means only the first call pays full input price and every later call pays a tenth on that prefix. Cache writes cost 1.25x (5-minute) or 2x (1-hour), paid once.
Why does Claude seem to use more tokens than other models? Opus 4.7 and newer (including Opus 4.8 and Fable 5) use a tokenizer that can produce up to about 35% more tokens for the same text. The effective cost per page is higher than the headline per-token rate, so compare providers on the same real text.