Last verified: June 30, 2026. LLM prices change often; confirm on each provider's pricing page before relying on a figure below.
The cheapest LLM APIs in 2026 are DeepSeek, Google's Gemini 2.5 Flash-Lite, and OpenAI's GPT-5.4 nano, all with input prices near or below $0.30 per million tokens. At that price a million words of input costs a few cents, which makes "too expensive" rarely the real constraint anymore. This guide ranks the cheapest paid LLM APIs by price per million tokens, separates the genuinely cheap from the merely mid-priced, and shows the three levers that cut a real bill in half. If your volume is small, the cheapest option may be free entirely, which is covered at the end.
Price per million tokens (approximate, June 2026)
Prices are approximate and change frequently. Input and output are billed separately; output usually costs several times more. Always confirm on the provider's own pricing page.
| Model | Input $/1M | Output $/1M | Notes | Pricing |
|---|---|---|---|---|
| OpenAI GPT-5.4 nano | ~$0.20 | ~$1.25 | Cheapest from OpenAI, capable for simple tasks | openai.com/api/pricing |
| Gemini 2.5 Flash-Lite | ~$0.10 | ~$0.40 | Very cheap, fast, multimodal | ai.google.dev/pricing |
| DeepSeek V4 Flash | ~$0.14 | ~$0.28 | Strong reasoning for the price; deep prompt-cache discount | api-docs.deepseek.com |
| GPT-5.4 mini | ~$0.75 | ~$4.50 | Step up in quality, still cheap | openai.com/api/pricing |
| Gemini 2.5 Flash | ~$0.30 | ~$2.50 | Workhorse balance of cost and quality | ai.google.dev/pricing |
| Mistral Small | ~$0.10 | ~$0.30 | Cheap EU-hosted option | mistral.ai/pricing |
| Llama (via Groq) | ~$0.10 to $0.60 | varies | Cheap and very fast inference | groq.com/pricing |
| Qwen (via providers) | ~$0.20 to $0.50 | varies | Strong multilingual, cheap | provider-dependent |
| Claude Haiku 4.5 | ~$1.00 | ~$5.00 | Pricier, but strong quality per token | anthropic.com/pricing |
The cheapest tier: GPT-5.4 nano, Gemini Flash-Lite, DeepSeek
At the bottom of the price range, three models stand out. GPT-5.4 nano is OpenAI's cheapest, good for classification, routing, and simple extraction. Gemini 2.5 Flash-Lite undercuts it on price with strong speed and multimodal input. DeepSeek V4 Flash sits between them and punches far above its price on reasoning and code, with a deep prompt-cache discount (cached input drops to a few thousandths of a cent per million) that makes repeated context nearly free. For most high-volume, low-difficulty tasks, any of these three is the right default.
The mid tier: when cheap is not capable enough
When the cheapest models miss, the next tier up is still inexpensive. GPT-5.4 mini and Gemini 2.5 Flash are the workhorses: a few times the price of the cheapest tier, but markedly better at multi-step instructions and longer context. Mistral Small and Llama via Groq fill the same niche with an EU-hosting or speed advantage. The jump from "cheapest" to "workhorse" is usually a 3 to 8 times price increase, which is still trivial next to frontier-model pricing.
How to actually cut your LLM bill
Choosing a cheap model is only the first lever. Three more cut real costs:
- Prompt caching. If you send the same long system prompt on every request, caching lets you pay for it once instead of every call. On a chat app with a big system prompt, this alone can cut input cost dramatically.
- Batch APIs. For non-urgent jobs (overnight processing, bulk extraction), most providers offer a batch endpoint at roughly half price in exchange for slower turnaround.
- Model routing. Send each request to the cheapest model that can handle it, and escalate to an expensive model only when needed. A router like OpenRouter or LiteLLM makes this one line of config. The full pattern is in the guide on LLM gateways and routers.
Stacked together, caching plus batch plus routing routinely cut a bill by more than half without changing what the app does.
Cheapest of all: free first
For small volume, the cheapest LLM API is no API bill at all. Gemini, Groq, Cerebras, and several OpenRouter models have free tiers that cost nothing within their rate limits, which covers a lot of real applications. Start there, and move to a cheap paid model like GPT-5.4 nano or DeepSeek only when you outgrow the free rate limits or need guaranteed throughput. The complete list of free options is in the guide on free LLM APIs in 2026.
Frequently asked questions
What is the cheapest LLM API in 2026? DeepSeek and the smallest models from Google and OpenAI are the cheapest paid LLM APIs in 2026, with input prices around $0.10 to $0.30 per million tokens. Gemini 2.5 Flash-Lite, GPT-5.4 nano, and DeepSeek's V4 Flash model sit at the bottom of the price range while still being capable for most tasks.
Is it cheaper to use a free LLM API instead? For low volume, yes. Free tiers from Gemini, Groq, and OpenRouter cost nothing within their rate limits, so a small app may never need to pay. Paid APIs become worthwhile when you exceed free rate limits or need guaranteed throughput. Many builders start free and move to a cheap paid model only when they scale.
How do input and output token prices differ? Output tokens almost always cost more than input, often 3 to 5 times more. A model priced at $0.30 per million input tokens may charge $2.50 per million output tokens. For cost estimates, weight your expected output volume heavily, because generation is where the bill grows.
How can I cut my LLM API costs further? Three levers: use prompt caching to avoid re-paying for a repeated system prompt, use batch APIs for non-urgent jobs (often half price), and route each request to the cheapest model that can handle it instead of sending everything to a frontier model. Together these can cut a bill by more than half.
Is the cheapest LLM API good enough for production? Often yes. Models like Gemini Flash, GPT-5.4 mini, and DeepSeek handle classification, extraction, summarization, and most chat at a fraction of frontier-model cost. Reserve expensive models for hard reasoning and code. Routing cheap models for easy tasks and expensive ones only when needed is the standard production pattern.
Related guides
- Free LLM APIs in 2026: the no-cost option first
- LLM Gateway Guide: route to the cheapest model automatically
- LLM Router: smart routing for cost savings
- Groq Pricing 2026: per-model breakdown
- Claude API Pricing 2026: all models, caching, batch
- Best Open Source LLM 2026: self-host to drop the per-token cost
- Free AI Coding Assistants 2026: power them with a cheap API
Which model do you ship on? I keep this table current as providers change pricing. Reply if a price has shifted or a model belongs on the list.