OpenRouter Free Tier in 2026: All 28+ Free Models, Real Rate Limits, BYOK 1M Requests, and How to Set It Up

Last updated: May 2026

OpenRouter's free tier in May 2026 gives you 20 requests per minute and 50-1000 requests per day against 28+ free models — including DeepSeek R1, Llama 3.3 70B, Qwen3 Coder 480B (262K context), Gemma 3, and Google Gemini 2.0 Flash. No credit card required. A separate BYOK (Bring Your Own Key) program gives 1 million free routing requests per month when you use your own provider keys. This guide covers the exact rate-limit math, the current free model roster, BYOK setup, the variant syntax (:free, :nitro, :floor), upgrade triggers, and how OpenRouter free stacks against HuggingFace Inference and Groq.

For free tiers across all providers, see Free LLM API Credits. For OpenRouter's deeper role as a gateway, see LLM Gateway in 2026.

Free tier rate limits at a glance

LimitFree tierAfter $10 in lifetime credits
Per-minute20 requests20 requests
Per-day50 requests1,000 requests
Token-per-minuteProvider-dependentProvider-dependent
Free modelsAll :free variantsAll :free variants

The 20 RPM cap is consistent — purchasing credits doesn't unlock higher per-minute throughput on free models. The daily limit is the lever: spending $10 once (it never expires) raises the daily floor from 50 to 1000 forever.

Token-per-minute caps depend on the underlying provider. DeepSeek's free hosting limits are stricter than Llama 3.3 70B free; Google Gemini 2.0 Flash free has its own provider-side limits.

The 28+ free models worth knowing (May 2026)

ModelContextStrength
Qwen3 Coder 480B (free)262KStrongest free coding model
DeepSeek R1 (free)128KReasoning, math, GPT-4 class
DeepSeek V3 (free)128KGeneral-purpose, strong all-rounder
Meta Llama 4 Scout (free)10MLargest context in any free model
Meta Llama 3.3 70B Instruct (free)128KSolid all-purpose, well-supported
Meta Llama 3.1 8B Instruct (free)128KFast, cheap, low-VRAM-friendly
Qwen 2.5 7B Instruct (free)128KSmall Asian-language alternative
Google Gemma 3 12B (free)128KGoogle's open model, safety-tuned
Google Gemini 2.0 Flash (free)1MMultimodal, large context
Mistral Small 24B (free)128KEU-hosted alternative
Phi-3 Medium (free)128KMicrosoft, strong-for-size
Nous Hermes 3 (free)128KFine-tuned Llama variant

The exact roster shifts month to month. Some models cycle in and out as upstream providers add or pull free hosting. Check openrouter.ai/models with the :free filter for the live list.

How to sign up and get an API key

  1. Go to openrouter.ai and sign up with email or GitHub. No credit card required.
  2. Open Settings → Keys in the dashboard.
  3. Click Create Key. Give it a name (e.g. "dev-laptop"). Copy the key — it starts with sk-or-.
  4. Point your code at the OpenAI-compatible endpoint:
from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="sk-or-...",
)

resp = client.chat.completions.create(
    model="deepseek/deepseek-r1:free",
    messages=[{"role": "user", "content": "Explain transformers in one paragraph."}],
)
print(resp.choices[0].message.content)
  1. Model ID format: vendor/model-name:variant. The :free suffix is the free variant; omit it for paid pricing.

That's the whole setup. No credit card, no email verification beyond signup, no application form.

Model variants — :free, :nitro, :floor

OpenRouter routes each request to one of multiple underlying providers (Together, Fireworks, DeepInfra, Hyperbolic, etc.). Variants control how that choice is made:

  • :free — picks free-tier hosting where available. Subject to free-tier rate limits and provider availability.
  • :nitro — picks the highest-throughput provider for the model. Useful when latency / TPS matters more than cost.
  • :floor — picks the cheapest available provider. Useful for paid usage where cost dominates.
  • :extended — picks a provider offering an extended context window on this model.
  • :thinking — turns on the model's reasoning mode (for models that support it like DeepSeek R1, GPT-5-thinking, Claude Sonnet 4 thinking).

You can stack appropriate variants. For free-tier high-throughput: deepseek/deepseek-r1:free. For paid cost-min: meta-llama/llama-3.3-70b-instruct:floor.

BYOK — 1 million free requests per month

OpenRouter's BYOK (Bring Your Own Key) program lets you point OpenRouter at your own provider keys (OpenAI, Anthropic, Google, Together, Groq, etc.) while still using OpenRouter's unified API and analytics.

The math: every customer gets 1,000,000 free BYOK requests per month. Above 1M, you pay 5% of the model's normal OpenRouter rate as a routing fee.

When BYOK makes sense:

  • You already have credits or commitments at one or more providers (OpenAI commit, Anthropic credits, etc.) and want unified observability across them.
  • You want to mix free OpenRouter models with your own provider-paid models in the same code path.
  • You hit free-tier rate limits but still want to keep using OpenRouter's gateway features (analytics, retries, fallback).

Setup:

  1. Go to Settings → Provider Keys.
  2. Paste your OpenAI / Anthropic / Google / etc. API keys.
  3. Call OpenRouter as normal — model routing will use your provider key when configured.

When the free tier breaks (upgrade signals)

Stay on free if:

  • Daily volume fits the 50/1000 cap.
  • 20 RPM doesn't bite (no traffic spikes above 1 req/3 seconds).
  • The rotating :free roster covers your model needs.
  • No SLA requirement (free hosting can drop or be rate-limited by providers).

Add $10+ in credits when:

  • You need more than 1000 free requests/day (raises the daily cap permanently).
  • You want access to specific paid models (GPT-5, Claude Sonnet 4, etc.).
  • You want :nitro variants for production throughput.
  • You want to stop worrying about free-tier rate-limiting during traffic spikes.

Upgrade to BYOK when:

  • You already have provider credits / commitments and just want OpenRouter as a gateway.
  • 1M monthly requests fits your traffic — that's ~32K requests/day, plenty for most apps.

OpenRouter free vs HuggingFace Inference Providers (PRO)

AspectOpenRouter freeHF Inference Providers (PRO, $9/mo)
Cost$0 (free tier)$9/month
Models28+ free models15+ providers, 100+ models
Daily limit50 / 1000 requests2M monthly Inference Provider credits
Per-minute limit20 RPMProvider-dependent
Setup1 minute1 minute
Best forPrototyping, hobby projectsML teams already on Hub

For pure prototyping with no budget, OpenRouter free beats everything. For developers who use HuggingFace Hub for models / datasets / Spaces, HF PRO is competitive at $9/month.

OpenRouter free vs Groq direct

Groq has its own free tier (30 RPM, 6K TPM, 14,400 RPD), runs open-source models only, and is significantly faster (Llama 3.3 70B at 394 TPS on Groq vs ~80-150 TPS on OpenRouter free routing). For Llama / Qwen / DeepSeek workloads where speed matters, Groq direct beats OpenRouter free on latency.

OpenRouter free wins on model breadth — 28+ models including Google Gemini 2.0 Flash, DeepSeek R1, Qwen3 Coder 480B that aren't all available on Groq. Use both: Groq for speed-critical paths, OpenRouter for breadth and fallback.

Common mistakes with the OpenRouter free tier

  • Treating :free as production-grade. Free hosting can be rate-limited or paused by providers. For customer-facing production, add $10 of credits and use paid variants.
  • Hitting 20 RPM without backoff. Implement exponential backoff with retries. Bursts above 20 RPM will 429.
  • Forgetting the daily limit. 50 requests/day on the unfunded free tier vanishes fast in development. Drop $10 in credits early.
  • Ignoring model ID format. deepseek/deepseek-r1 (paid passthrough) and deepseek/deepseek-r1:free (free hosting) are different. Always include :free for free tier.
  • Single-model dependency on free. Free models can drop out of the roster. Use a fallback chain (LiteLLM / OpenRouter's auto routing).

What the free tier is not for

  • High-volume production at >1000 req/day per user.
  • Strict-SLA workloads (free hosting has no uptime guarantee).
  • Frontier closed models (GPT-5, Claude Opus 4.7 — not available free).
  • Use cases that need :nitro speed or :extended context.

For all of those, $10 in credits is the right next step.

enjoyed this?

Follow me for more on AI agents, dev tools, and building with LLMs.

X / Twitter LinkedIn GitHub
← Back to blog