Last updated: June 2026
If you are building anything with LLMs in 2026, you should not be paying for inference yet. Between Google's Gemini free tier, Groq, Cerebras, OpenRouter, and a dozen smaller providers, you can run real production workloads - chatbots, agents, research pipelines - for $0/month. This is the complete map of every free LLM API still active in June 2026, with rate limits, model access, expiry rules, and which ones do not even ask for a credit card. If you are stacking, this is also the order I would apply in.
Comparison at a glance
| Provider | Models | Free quota | Card required | Best for |
|---|---|---|---|---|
| Google Gemini API | Gemini 2.5 Flash (free tier reduced 2025) | 1,500 req/day Flash, 10 RPM | No | Most accessible free baseline |
| Groq | Llama 3.1, Mixtral, Gemma 2 | 30 RPM, 6K TPM, 1,000 req/day | No | Speed-critical apps (315 TPS) |
| Cerebras | gpt-oss-120b, zai-glm-4.7 | 5 RPM, 30K TPM, 1M tokens/day | No | Very large prompts |
| NVIDIA NIM | Many open + proprietary | Free prototyping tier | Account required | Trying new models |
| OpenRouter | Aggregated (50+ models) | Several free-tier models | No | One key, many models |
| OpenAI | GPT-5 family | Data-sharing tokens; ~$5 trial inconsistent | Yes | Eval / one-off tests |
| Anthropic Claude | Opus 4.8, Sonnet 4.6, Haiku 4.5, Fable 5 | ~$5 trial; OSS 6mo Max | Yes | Best-in-class reasoning |
| Mistral La Plateforme | Mistral Small / Large | Free trial credits | Yes | EU compliance |
| Cohere | Command R / R+ | Free trial credits | Yes | RAG-first stacks |
| DeepSeek | DeepSeek V4-Flash / V4-Pro | Cheap pay-as-you-go | Yes | Cheap reasoning |
| xAI Grok | Grok (latest generation) | Limited free credits | Yes | X integration |
| Hugging Face | Open source models | Rate-limited free tier | No | Open weights inference |
| Together AI | 100+ open models | Small free credits | Yes | Open model fine-tuning |
Google Gemini API
- Models: Gemini 2.5 Flash, Gemini 2.5 Flash-Lite, and newer 3.x Flash preview models (Gemini 2.0 was retired June 2026)
- Free tier: Gemini 2.5 Flash on the free tier (Pro and Flash-Lite available on paid). 1,500 requests/day at 10 RPM. Google reduced free tier quotas significantly in late 2025 - confirm current limits at ai.google.dev before relying on the figures.
- Rate limits: Per-model RPM and TPM caps; Flash-Lite has the highest free RPM
- Card required: No. Google account only.
- How to start: ai.google.dev -> Get API Key -> immediate access
- Best for: Sustained workloads where you want the most "do whatever" capacity without paying
The most accessible free LLM API baseline in 2026 - even after late-2025 quota reductions, Gemini Flash at 1,500 req/day is enough for prototyping. For larger sustained volume, pair Gemini Flash with Cerebras (1M tokens/day) and Groq.
Groq
- Models: Llama 3.1 70B and 8B, Mixtral 8x7B, Gemma 2
- Free tier: 30 requests/minute (per-model limits vary; Llama 3.1 8B allows 14,400 requests/day), capped but workable for prototyping. Speed: hundreds of tokens/sec on Llama 70B (among the fastest)
- Differentiator: LPUs (Language Processing Units) deliver dramatically faster inference than GPU stacks. Sub-second responses for 70B-class models.
- Card required: No
- How to start: console.groq.com -> Sign up -> API key
- Best for: Real-time UX (voice, chat with streaming), high throughput batch jobs
If your application is latency-sensitive, Groq's free tier alone is often enough to run production until you hit real scale.
Cerebras
- Models: gpt-oss-120b, zai-glm-4.7 (free-tier roster changes, check the dashboard)
- Free tier: 5 requests per minute, 30K tokens per minute, 1M tokens per day
- Differentiator: Wafer-scale chips designed for inference. Very long context handling and competitive throughput.
- Card required: No
- How to start: cloud.cerebras.ai
- Best for: Long-context tasks (large documents, RAG), batch inference within daily quota
NVIDIA NIM (build.nvidia.com)
- Models: Wide selection - Llama, Mistral, NVIDIA-tuned models, vision models, embeddings
- Free tier: Free during prototyping. Conversion to production usually requires NVIDIA Inception or paid tier.
- Card required: Account required, no card for free tier
- How to start: build.nvidia.com -> sign in -> get API key
- Best for: Trying new model architectures before committing to a provider
OpenRouter
- Models: Aggregator routing to 50+ models from every major provider, plus several free-tier models hosted directly
- Free tier: ~28 free models (DeepSeek R1, Llama 3.3 70B, Qwen3 Coder, Gemma 3, and more) at 20 RPM and 50 requests/day, raised to 1,000/day after a one-time $10 credit purchase
- Card required: No (for free models)
- Pricing: Pay-per-token for paid models, transparent unit economics
- How to start: openrouter.ai
- Best for: Single API key replacing 5+ provider integrations
OpenAI
- Models: GPT-5 family, embeddings
- Free tier: Complimentary tokens if you opt in to share API traffic; the old automatic ~$5 credit is now inconsistent
- Card required: Yes (for any usage beyond initial trial)
- Stacking: OpenAI for Startups program (separate application) gives larger credit grants - see Free AI API Credits for details
- Best for: One-off evaluation. Not for production unless you are a paid user.
Anthropic Claude
- Models: Claude Opus 4.8, Sonnet 4.6, Haiku 4.5, plus the new Fable 5 frontier model
- Free tier: Around $5 in starter credits for new accounts
- Special program: Claude for Open Source (launched February 2026) - qualifying open source maintainers get six months of Claude Max 20x for free, total value $1,200, 10,000 total spots. This is the largest free Claude grant of 2026.
- Card required: Yes (for API access beyond trial)
- How to start: console.anthropic.com; for OSS apply at the Claude for Open Source program page
- Best for: Highest-quality reasoning when you have OSS standing or don't mind paying
Mistral La Plateforme
- Models: Mistral Small, Mistral Large, Codestral, Embed
- Free tier: Trial credits on signup, modest amount
- Card required: Yes
- Best for: EU-compliant workloads, multilingual generation
Cohere
- Models: Command R, Command R+, Embed, Rerank
- Free tier: Trial credits, generous for evaluation
- Card required: Yes
- Best for: RAG-first applications (Cohere's Rerank is particularly strong)
DeepSeek
- Models: DeepSeek V4-Flash, DeepSeek V4-Pro (thinking + non-thinking, 1M context)
- Free tier: Very cheap pay-as-you-go (V4-Flash from ~$0.14/M input); any free signup grant is small and not guaranteed
- Card required: Yes
- Best for: High-volume reasoning workloads at a fraction of the cost of comparable models
xAI Grok
- Models: Grok (latest generation)
- Free tier: Limited free credits, mostly an evaluation tier
- Card required: Yes
- Best for: Apps integrating with X (Twitter) where Grok's real-time data is the differentiator
Hugging Face Inference API
- Models: Thousands of open source models hosted on the Hub
- Free tier: Rate-limited free access; production use requires Inference Endpoints or PRO subscription
- Card required: No
- Best for: Trying open weights without setting up your own GPU
Together AI
- Models: 100+ open source models (Llama, Mixtral, Qwen, plus fine-tuning support)
- Free tier: Small starter credits
- Card required: Yes
- Best for: Fine-tuning your own model on open weights
How to stack free LLM APIs
If you want to run a real product on $0/month inference, the strategy is:
- Default route through Gemini for general chat and instruction-following work
- Use Groq for real-time voice or interactive flows where latency matters
- Cerebras for long-context RAG or document analysis
- OpenRouter as a safety net - when a primary provider rate-limits you, route the same request through an OpenRouter free model
- Paid burst capacity - keep a small paid balance on OpenAI or Anthropic for the queries where you actually need the latest GPT-5 or Claude Opus 4.8 / Fable 5 quality
For the routing layer itself, see LLM Gateway 2026 - comparison of OpenRouter, LiteLLM, Portkey, and Helicone as the orchestration plane. For provider-by-provider ranking on raw output quality, see Best Free LLM 2026. For self-hosted alternatives, HuggingFace Inference API covers the open-weights route.
Most production agents I have built run 90 percent through free tiers and only spend on the top 10 percent of high-stakes calls.
Frequently asked questions
Is there a truly free LLM API I can use without a credit card? Yes. Google Gemini API, Groq, Cerebras, NVIDIA NIM (build.nvidia.com), and most OpenRouter free-tier models do not require a credit card to start. You sign up with an email or Google account and receive immediate API access with rate limits.
Which free LLM API is the most generous in 2026? Cerebras leads on raw daily volume - 1M tokens/day with no credit card. Google Gemini API is the most accessible baseline (~1,500 req/day Flash) after the late-2025 free-tier reductions. For raw speed, Groq is among the fastest at hundreds of tokens/sec on Llama 70B.
Can I combine multiple free LLM APIs to get more inference? Yes - and you should. Each provider has independent rate limits, so routing across Gemini + Groq + OpenRouter + Cerebras is a common practice that multiplies your free capacity. Use a router like OpenRouter, LiteLLM, or roll your own with a model registry.
Does OpenAI still give free credits to new accounts? Not reliably. The old automatic ~$5 new-account credit is now inconsistent; OpenAI's current free route is complimentary tokens for sharing your API traffic. For production, use startup programs (OpenAI for Startups) or stack free tiers like Gemini and Groq.
Is the Claude API free for open source maintainers? Anthropic's Claude for Open Source program (launched February 2026) gives qualifying open source maintainers six months of Claude Max 20x for free - a $1,200 value with 10,000 total spots. Beyond that, new accounts get a small credit grant (around $5).
Related guides
- Free AI API Credits - full credits-and-grants programs (OpenAI for Startups, Anthropic Startups, etc.)
- Free GPU Compute - running your own model when free APIs are not enough
- Free Cloud Credits for Developers - infra to host your inference
- Free Startup Credits 2026: Complete Guide - every credits program in one place
Which free tier are you on right now? I update this list as providers change quotas. Reply if a quota has shifted or a new provider deserves the table.