Last updated: April 2026
If you are building anything with LLMs in 2026, you should not be paying for inference yet. Between Google's Gemini free tier, Groq, Cerebras, OpenRouter, and a dozen smaller providers, you can run real production workloads — chatbots, agents, research pipelines — for $0/month. This is the complete map of every free LLM API still active in April 2026, with rate limits, model access, expiry rules, and which ones do not even ask for a credit card. If you are stacking, this is also the order I would apply in.
Comparison at a glance
| Provider | Models | Free quota | Card required | Best for |
|---|---|---|---|---|
| Google Gemini API | Gemini 2.5 Flash (free tier reduced 2025) | 1,500 req/day Flash, 10 RPM | No | Most accessible free baseline |
| Groq | Llama 3.1, Mixtral, Gemma 2 | 30 RPM, 6K TPM, 1,000 req/day | No | Speed-critical apps (315 TPS) |
| Cerebras | Llama 3.1 70B / 8B | 30 RPM, 60K TPM, 1M tokens/day | No | Very large prompts |
| NVIDIA NIM | Many open + proprietary | Free prototyping tier | Account required | Trying new models |
| OpenRouter | Aggregated (50+ models) | Several free-tier models | No | One key, many models |
| OpenAI | GPT-4o, GPT-5 family | $5 trial, expires 3 months | Yes | Eval / one-off tests |
| Anthropic Claude | Claude 3.5/4 family | $5 trial; OSS program 6mo Max | Yes | Best-in-class reasoning |
| Mistral La Plateforme | Mistral Small / Large | Free trial credits | Yes | EU compliance |
| Cohere | Command R / R+ | Free trial credits | Yes | RAG-first stacks |
| DeepSeek | DeepSeek V3 / R1 | Generous free tier | Yes | Cheap reasoning |
| xAI Grok | Grok 2 / 3 | Limited free credits | Yes | X integration |
| Hugging Face | Open source models | Rate-limited free tier | No | Open weights inference |
| Together AI | 100+ open models | Small free credits | Yes | Open model fine-tuning |
Google Gemini API
- Models: Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 2.5 Flash-Lite, Gemini 2.0 family
- Free tier: Gemini 2.5 Flash on the free tier (Pro and Flash-Lite available on paid). 1,500 requests/day at 10 RPM. Google reduced free tier quotas significantly in late 2025 — confirm current limits at ai.google.dev before relying on the figures.
- Rate limits: Per-model RPM and TPM caps; Flash-Lite has the highest free RPM
- Card required: No. Google account only.
- How to start: ai.google.dev -> Get API Key -> immediate access
- Best for: Sustained workloads where you want the most "do whatever" capacity without paying
The most accessible free LLM API baseline in 2026 — even after late-2025 quota reductions, Gemini Flash at 1,500 req/day is enough for prototyping. For larger sustained volume, pair Gemini Flash with Cerebras (1M tokens/day) and Groq.
Groq
- Models: Llama 3.1 70B and 8B, Mixtral 8x7B, Gemma 2
- Free tier: 30 requests/minute, 6,000 tokens/minute, 1,000 requests/day — capped but workable for prototyping. Speed: ~315 tokens/sec on Llama 70B (unmatched)
- Differentiator: LPUs (Language Processing Units) deliver dramatically faster inference than GPU stacks. Sub-second responses for 70B-class models.
- Card required: No
- How to start: console.groq.com -> Sign up -> API key
- Best for: Real-time UX (voice, chat with streaming), high throughput batch jobs
If your application is latency-sensitive, Groq's free tier alone is often enough to run production until you hit real scale.
Cerebras
- Models: Llama 3.1 70B, Llama 3.1 8B
- Free tier: 30 requests per minute, 60K tokens per minute, 1M tokens per day
- Differentiator: Wafer-scale chips designed for inference. Very long context handling and competitive throughput on Llama 3.1 70B.
- Card required: No
- How to start: cloud.cerebras.ai
- Best for: Long-context tasks (large documents, RAG), batch inference within daily quota
NVIDIA NIM (build.nvidia.com)
- Models: Wide selection — Llama, Mistral, NVIDIA-tuned models, vision models, embeddings
- Free tier: Free during prototyping. Conversion to production usually requires NVIDIA Inception or paid tier.
- Card required: Account required, no card for free tier
- How to start: build.nvidia.com -> sign in -> get API key
- Best for: Trying new model architectures before committing to a provider
OpenRouter
- Models: Aggregator routing to 50+ models from every major provider, plus several free-tier models hosted directly
- Free tier: Mistral 7B Free, Gemma 2 9B Free, several other free routes — strict rate limits but workable for evaluation
- Card required: No (for free models)
- Pricing: Pay-per-token for paid models, transparent unit economics
- How to start: openrouter.ai
- Best for: Single API key replacing 5+ provider integrations
OpenAI
- Models: GPT-4o, GPT-5 family, o-series reasoning, embeddings
- Free tier: Approximately $5 in credits on new accounts, expire three months after activation, usable across all models
- Card required: Yes (for any usage beyond initial trial)
- Stacking: OpenAI for Startups program (separate application) gives larger credit grants — see Free AI API Credits for details
- Best for: One-off evaluation. Not for production unless you are a paid user.
Anthropic Claude
- Models: Claude 4.x family (Opus, Sonnet, Haiku)
- Free tier: Around $5 in starter credits for new accounts
- Special program: Claude for Open Source (launched February 2026) — qualifying open source maintainers get six months of Claude Max 20x for free, total value $1,200, 10,000 total spots. This is the largest free Claude grant of 2026.
- Card required: Yes (for API access beyond trial)
- How to start: console.anthropic.com; for OSS apply at the Claude for Open Source program page
- Best for: Highest-quality reasoning when you have OSS standing or don't mind paying
Mistral La Plateforme
- Models: Mistral Small, Mistral Large, Codestral, Embed
- Free tier: Trial credits on signup, modest amount
- Card required: Yes
- Best for: EU-compliant workloads, multilingual generation
Cohere
- Models: Command R, Command R+, Embed, Rerank
- Free tier: Trial credits, generous for evaluation
- Card required: Yes
- Best for: RAG-first applications (Cohere's Rerank is particularly strong)
DeepSeek
- Models: DeepSeek V3, DeepSeek R1 (reasoning)
- Free tier: Generous free tier; one of the cheapest paid options when you exceed it
- Card required: Yes
- Best for: High-volume reasoning workloads at a fraction of the cost of comparable models
xAI Grok
- Models: Grok 2, Grok 3
- Free tier: Limited free credits, mostly an evaluation tier
- Card required: Yes
- Best for: Apps integrating with X (Twitter) where Grok's real-time data is the differentiator
Hugging Face Inference API
- Models: Thousands of open source models hosted on the Hub
- Free tier: Rate-limited free access; production use requires Inference Endpoints or PRO subscription
- Card required: No
- Best for: Trying open weights without setting up your own GPU
Together AI
- Models: 100+ open source models (Llama, Mixtral, Qwen, plus fine-tuning support)
- Free tier: Small starter credits
- Card required: Yes
- Best for: Fine-tuning your own model on open weights
How to stack free LLM APIs
If you want to run a real product on $0/month inference, the strategy is:
- Default route through Gemini for general chat and instruction-following work
- Use Groq for real-time voice or interactive flows where latency matters
- Cerebras for long-context RAG or document analysis
- OpenRouter as a safety net — when a primary provider rate-limits you, route the same request through an OpenRouter free model
- Paid burst capacity — keep a small paid balance on OpenAI or Anthropic for the queries where you actually need GPT-5 or Claude Opus 4.x quality
Most production agents I have built run 90 percent through free tiers and only spend on the top 10 percent of high-stakes calls.
Frequently asked questions
Is there a truly free LLM API I can use without a credit card? Yes. Google Gemini API, Groq, Cerebras, NVIDIA NIM (build.nvidia.com), and most OpenRouter free-tier models do not require a credit card to start. You sign up with an email or Google account and receive immediate API access with rate limits.
Which free LLM API is the most generous in 2026? Cerebras leads on raw daily volume — 1M tokens/day on Llama 3.1 70B with no credit card. Google Gemini API is the most accessible baseline (1,500 req/day Flash) after the late-2025 free-tier reductions. For raw speed, Groq is unmatched at ~315 tokens/sec on Llama 70B.
Can I combine multiple free LLM APIs to get more inference? Yes — and you should. Each provider has independent rate limits, so routing across Gemini + Groq + OpenRouter + Cerebras is a common practice that multiplies your free capacity. Use a router like OpenRouter, LiteLLM, or roll your own with a model registry.
Does OpenAI still give free credits to new accounts? Yes, but they are small — about $5 in credits that expire three months after activation, usable across all OpenAI models. The free tier is mainly for evaluation, not production. For production, look at startup programs (OpenAI for Startups) or stack alternatives like Gemini and Groq.
Is the Claude API free for open source maintainers? Anthropic's Claude for Open Source program (launched February 2026) gives qualifying open source maintainers six months of Claude Max 20x for free — a $1,200 value with 10,000 total spots. Beyond that, new accounts get a small credit grant (around $5).
Related guides
- Free AI API Credits — full credits-and-grants programs (OpenAI for Startups, Anthropic Startups, etc.)
- Free GPU Compute — running your own model when free APIs are not enough
- Free Cloud Credits for Developers — infra to host your inference
- Free Startup Credits 2026: Complete Guide — every credits program in one place
Which free tier are you on right now? I update this list as providers change quotas. Reply if a quota has shifted or a new provider deserves the table.