Unified API gateway to 200+ LLMs from a single OpenAI-compatible endpoint. Model switching, automatic fallbacks, real-time cost comparison, and provider routing — all with one API key.
OpenRouter is a unified API gateway that aggregates 200+ LLMs from OpenAI, Anthropic, Google, Meta, Mistral, Cohere, and dozens of other providers and model hosts. You get a single OpenAI-compatible endpoint (https://openrouter.ai/api/v1) and one API key that works for all models. Key features: real-time pricing comparison across providers, automatic fallback when a provider is down, provider routing (pick fastest, cheapest, or most reliable), and per-request cost tracking.
import openai
# OpenRouter uses the OpenAI SDK with a custom base URL
client = openai.OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="sk-or-v1-...", # OpenRouter API key
)
# Use any model by its OpenRouter name
response = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}],
extra_headers={
"HTTP-Referer": "https://yourapp.com", # optional, for analytics
"X-Title": "My App", # optional, appears in dashboard
},
)
print(response.choices[0].message.content)
# Other models — same code, different model string:
# "anthropic/claude-3-haiku"
# "meta-llama/llama-3.1-8b-instruct"
# "google/gemini-flash-1.5"
# "mistralai/mistral-7b-instruct"
# "qwen/qwen-2.5-72b-instruct"
import requests
# Get current model list with pricing
models = requests.get(
"https://openrouter.ai/api/v1/models",
headers={"Authorization": f"Bearer {api_key}"},
).json()["data"]
# Sort by input cost
by_cost = sorted(models, key=lambda m: float(m.get("pricing", {}).get("prompt", 999)))
for m in by_cost[:5]:
price = m.get("pricing", {})
print(f"{m['id']}: ${float(price.get('prompt', 0))*1e6:.3f}/1M input tokens")
# Output (approximate, prices change):
# meta-llama/llama-3.2-1b-instruct:free: $0.000/1M
# google/gemma-2-9b-it:free: $0.000/1M
# meta-llama/llama-3.1-8b-instruct: $0.055/1M
# mistralai/mistral-7b-instruct: $0.055/1M
# google/gemini-flash-1.5: $0.075/1M
# OpenRouter automatically falls back to the next provider if one is down
# You can also specify explicit fallbacks:
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Explain quantum computing"}],
extra_body={
"route": "fallback",
"models": [ # try in order
"openai/gpt-4o",
"anthropic/claude-3-5-sonnet",
"google/gemini-pro-1.5",
],
},
)
# The response includes which model was actually used:
# response.model == "openai/gpt-4o" or whichever succeeded
print(f"Used model: {response.model}")
# Use OpenRouter to route to cheapest provider for a given model
response = client.chat.completions.create(
model="meta-llama/llama-3.1-70b-instruct",
messages=[{"role": "user", "content": "Summarise this document..."}],
extra_body={
"provider": {
"order": ["Fireworks", "Together", "Groq"], # try cheapest first
"allow_fallbacks": True,
}
},
)
# Track cost per request via usage metadata
usage = response.usage
print(f"Tokens: {usage.prompt_tokens} in, {usage.completion_tokens} out")
# OpenRouter returns cost in the x-openrouter-cost response header
# or via the /api/v1/generation?id= endpoint
# Route to specific providers based on your needs
# Fastest (lowest latency) — Groq is typically fastest
fast_response = client.chat.completions.create(
model="meta-llama/llama-3.1-8b-instruct",
extra_body={"provider": {"order": ["Groq"], "allow_fallbacks": True}},
messages=[{"role": "user", "content": "What is 2+2?"}],
)
# Most reliable (highest uptime SLA)
reliable_response = client.chat.completions.create(
model="openai/gpt-4o-mini",
extra_body={"provider": {"order": ["OpenAI", "Azure"], "allow_fallbacks": True}},
messages=[{"role": "user", "content": "Classify this email..."}],
)
# Data privacy (only providers with specific data policies)
private_response = client.chat.completions.create(
model="anthropic/claude-3-haiku",
extra_body={"provider": {"data_collection": "deny"}}, # opt out of training
messages=[{"role": "user", "content": "Here is our private data..."}],
)
provider/model-name format (openai/gpt-4o, not just gpt-4o). Using the wrong format returns a 404. Check the OpenRouter model list for exact IDs.:free suffix) but with low rate limits. Don't use free tier models in production.OpenRouter abstracts away provider outages and rate limits by intelligently routing requests. If OpenAI is down, a request can failover to Anthropic; if Azure is slow, it can try Google. This resilience is valuable for production systems, though latency may vary. OpenRouter also offers discounts for batch requests and longer context use—e.g., a 100K-token prompt costs less with OpenRouter than directly through OpenAI's API.
Cost optimization tips: Use lower-tier models for simple tasks (routing to Llama 2 instead of GPT-4 cuts costs 10×). Cache long prompts (system messages, documents) to avoid re-processing. Monitor per-model performance and accuracy to ensure you're not paying for overkill. OpenRouter's dashboard shows real-time costs and fallback rates, helping you optimize spend.
| Model (via OpenRouter) | Input Cost (per 1M tok) | Output Cost (per 1M tok) | Latency (p50) | Availability |
|---|---|---|---|---|
| GPT-4 Turbo | $10 | $30 | 2–5s | 99.9% |
| Claude 3 Opus | $15 | $75 | 1–3s | 99.95% |
| Llama 2 70B | $0.81 | $1.08 | 500–1500ms | 98% |
| Mistral 7B | $0.14 | $0.42 | 200–500ms | 97% |
OpenRouter API integration patterns: Most teams use OpenRouter via a standard OpenAI-compatible client library (LangChain, LlamaIndex, Python requests with a simple wrapper). The API key is your authentication; requests route transparently to backend providers. OpenRouter exposes provider-specific parameters (e.g., "temperature" for sampling, "top_p" for nucleus sampling) that map to each backend's capabilities. If a provider doesn't support a parameter, OpenRouter silently ignores it, ensuring client code is portable.
Advanced patterns include provider hints (prefer Claude, fallback to GPT-4), custom retry policies (retry 3 times on timeout), and per-user cost caps (prevent runaway spending). OpenRouter's documentation and community Discord are active; issues are resolved quickly. For teams moving from direct OpenAI/Anthropic APIs to OpenRouter, the migration is low-friction because the client code is nearly identical. Monitoring dashboards show per-provider costs and success rates in real time.
OpenRouter use cases and adoption: Common use cases include: (1) cost-sensitive applications (use cheaper models by default, escalate to GPT-4 only when necessary), (2) high-reliability applications (failover if one provider is down), (3) multi-modal applications (route images to Claude, text to GPT-4, code to Codex), and (4) experimental applications (test multiple models on the same data). Startups often adopt OpenRouter to avoid vendor lock-in; if they later want to switch providers, the code changes are minimal.
Monitoring and debugging: OpenRouter provides usage logs (via dashboard or API) showing per-model costs, latencies, and error rates. Set up cost alerts if spending exceeds a threshold. Track per-endpoint error rates (some models fail more often than others on specific tasks) and adjust routing policies accordingly. Use OpenRouter's fallback_models feature to ensure critical requests always complete, even if the primary provider is down.
One pitfall: assuming all models understand the same parameters. For example, `top_k` means different things in different implementations. OpenRouter normalizes these to a standard set, but knowing the mapping helps when debugging unexpected model behavior. Documentation is thorough, and the support team is responsive.
OpenRouter provides detailed cost comparison tools to help optimize inference expenses across different model providers and versions. Understanding pricing structures—per-token vs. per-request models, batch discounts, and regional pricing variations—enables informed decisions about cost-performance trade-offs. The platform facilitates A/B testing different models to find the best balance between quality and cost for your specific use case. Real-time pricing information helps you select the most economical option without sacrificing model quality.
Beyond simple cost comparison, OpenRouter's APIs support sophisticated routing strategies. Implementing fallback chains that gracefully degrade when preferred models are unavailable maintains application reliability. Weight-based routing enables gradually rolling out new models while monitoring quality metrics, supporting production-grade deployments at scale.
import openrouter
# Cost-optimized model selection
response = openrouter.ChatCompletion.create(
model="openrouter/auto", # Automatic model selection
routing_preference="cost", # Prioritize cheaper models
messages=[
{"role": "user", "content": "Your query here"}
]
)
# Manual cost optimization
models_by_cost = openrouter.list_models(
sort_by="cost",
capability="instruct"
)
# Use cheapest suitable model
selected_model = models_by_cost[0]
Optimizing inference costs becomes increasingly important as applications scale. OpenRouter addresses this challenge by aggregating multiple providers and offering transparent pricing, enabling data-driven decisions about model selection and routing strategies that balance cost and quality effectively.
The API standardization across multiple providers is a major advantage of platforms like OpenRouter. Whether using Claude, GPT-4, Llama, or Mistral models, the API remains consistent. This consistency enables straightforward swapping between models to evaluate different options or handle provider-specific issues. Applications built against OpenRouter APIs port easily to any underlying model, providing valuable flexibility and reducing vendor lock-in concerns.
Monitoring and analytics features help teams understand usage patterns and optimize costs. Detailed logging of API calls, token usage, and latency measurements enables data-driven decisions about cost optimization. Identifying models that perform well for your use cases while minimizing cost becomes tractable with comprehensive analytics.
The reliability layer in OpenRouter includes automatic retries, fallback strategies, and rate limiting protection. Rather than handling these concerns in application code, OpenRouter manages them transparently. This abstraction layer improves application reliability while reducing operational complexity. Teams can focus on application logic rather than low-level API resilience patterns.
OpenRouter continues evolving to support new models and capabilities as they emerge. The platform acts as a rapid adoption mechanism for cutting-edge models, making latest capabilities immediately available to applications. This forward-looking design keeps applications current without major refactoring, supporting continuous improvement as the AI landscape evolves.