OpenRouter

What OpenRouter provides
Getting started
Model selection and comparison
Automatic fallbacks
Cost optimisation
Provider routing
Gotchas

SECTION 01

What OpenRouter provides

OpenRouter is a unified API gateway that aggregates 200+ LLMs from OpenAI, Anthropic, Google, Meta, Mistral, Cohere, and dozens of other providers and model hosts. You get a single OpenAI-compatible endpoint (https://openrouter.ai/api/v1) and one API key that works for all models. Key features: real-time pricing comparison across providers, automatic fallback when a provider is down, provider routing (pick fastest, cheapest, or most reliable), and per-request cost tracking.

SECTION 02

Getting started

import openai

# OpenRouter uses the OpenAI SDK with a custom base URL
client = openai.OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="sk-or-v1-...",  # OpenRouter API key
)

# Use any model by its OpenRouter name
response = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_headers={
        "HTTP-Referer": "https://yourapp.com",  # optional, for analytics
        "X-Title": "My App",                    # optional, appears in dashboard
    },
)
print(response.choices[0].message.content)

# Other models — same code, different model string:
# "anthropic/claude-3-haiku"
# "meta-llama/llama-3.1-8b-instruct"
# "google/gemini-flash-1.5"
# "mistralai/mistral-7b-instruct"
# "qwen/qwen-2.5-72b-instruct"

SECTION 03

Model selection and comparison

import requests

# Get current model list with pricing
models = requests.get(
    "https://openrouter.ai/api/v1/models",
    headers={"Authorization": f"Bearer {api_key}"},
).json()["data"]

# Sort by input cost
by_cost = sorted(models, key=lambda m: float(m.get("pricing", {}).get("prompt", 999)))
for m in by_cost[:5]:
    price = m.get("pricing", {})
    print(f"{m['id']}: ${float(price.get('prompt', 0))*1e6:.3f}/1M input tokens")

# Output (approximate, prices change):
# meta-llama/llama-3.2-1b-instruct:free: $0.000/1M
# google/gemma-2-9b-it:free: $0.000/1M
# meta-llama/llama-3.1-8b-instruct: $0.055/1M
# mistralai/mistral-7b-instruct: $0.055/1M
# google/gemini-flash-1.5: $0.075/1M

SECTION 04

Automatic fallbacks

# OpenRouter automatically falls back to the next provider if one is down
# You can also specify explicit fallbacks:

response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Explain quantum computing"}],
    extra_body={
        "route": "fallback",
        "models": [               # try in order
            "openai/gpt-4o",
            "anthropic/claude-3-5-sonnet",
            "google/gemini-pro-1.5",
        ],
    },
)

# The response includes which model was actually used:
# response.model == "openai/gpt-4o" or whichever succeeded
print(f"Used model: {response.model}")

SECTION 05

Cost optimisation

# Use OpenRouter to route to cheapest provider for a given model
response = client.chat.completions.create(
    model="meta-llama/llama-3.1-70b-instruct",
    messages=[{"role": "user", "content": "Summarise this document..."}],
    extra_body={
        "provider": {
            "order": ["Fireworks", "Together", "Groq"],  # try cheapest first
            "allow_fallbacks": True,
        }
    },
)

# Track cost per request via usage metadata
usage = response.usage
print(f"Tokens: {usage.prompt_tokens} in, {usage.completion_tokens} out")
# OpenRouter returns cost in the x-openrouter-cost response header
# or via the /api/v1/generation?id= endpoint

SECTION 06

Provider routing

# Route to specific providers based on your needs

# Fastest (lowest latency) — Groq is typically fastest
fast_response = client.chat.completions.create(
    model="meta-llama/llama-3.1-8b-instruct",
    extra_body={"provider": {"order": ["Groq"], "allow_fallbacks": True}},
    messages=[{"role": "user", "content": "What is 2+2?"}],
)

# Most reliable (highest uptime SLA)
reliable_response = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    extra_body={"provider": {"order": ["OpenAI", "Azure"], "allow_fallbacks": True}},
    messages=[{"role": "user", "content": "Classify this email..."}],
)

# Data privacy (only providers with specific data policies)
private_response = client.chat.completions.create(
    model="anthropic/claude-3-haiku",
    extra_body={"provider": {"data_collection": "deny"}},  # opt out of training
    messages=[{"role": "user", "content": "Here is our private data..."}],
)

SECTION 07

Gotchas

Model name format: OpenRouter uses provider/model-name format (openai/gpt-4o, not just gpt-4o). Using the wrong format returns a 404. Check the OpenRouter model list for exact IDs.
Free tier limits: Several models are available free on OpenRouter (:free suffix) but with low rate limits. Don't use free tier models in production.
Latency overhead: OpenRouter adds ~50–100ms latency vs calling providers directly. For latency-sensitive applications, calling OpenAI/Anthropic directly is faster. OpenRouter's value is in routing, fallback, and simplicity — not raw speed.
Pricing volatility: Provider pricing changes frequently. Don't hardcode cost calculations — query the OpenRouter API for current pricing before making routing decisions.

SECTION 08

OpenRouter reliability and cost optimization

OpenRouter abstracts away provider outages and rate limits by intelligently routing requests. If OpenAI is down, a request can failover to Anthropic; if Azure is slow, it can try Google. This resilience is valuable for production systems, though latency may vary. OpenRouter also offers discounts for batch requests and longer context use—e.g., a 100K-token prompt costs less with OpenRouter than directly through OpenAI's API.

Cost optimization tips: Use lower-tier models for simple tasks (routing to Llama 2 instead of GPT-4 cuts costs 10×). Cache long prompts (system messages, documents) to avoid re-processing. Monitor per-model performance and accuracy to ensure you're not paying for overkill. OpenRouter's dashboard shows real-time costs and fallback rates, helping you optimize spend.

Model (via OpenRouter)	Input Cost (per 1M tok)	Output Cost (per 1M tok)	Latency (p50)	Availability
GPT-4 Turbo	$10	$30	2–5s	99.9%
Claude 3 Opus	$15	$75	1–3s	99.95%
Llama 2 70B	$0.81	$1.08	500–1500ms	98%
Mistral 7B	$0.14	$0.42	200–500ms	97%

OpenRouter API integration patterns: Most teams use OpenRouter via a standard OpenAI-compatible client library (LangChain, LlamaIndex, Python requests with a simple wrapper). The API key is your authentication; requests route transparently to backend providers. OpenRouter exposes provider-specific parameters (e.g., "temperature" for sampling, "top_p" for nucleus sampling) that map to each backend's capabilities. If a provider doesn't support a parameter, OpenRouter silently ignores it, ensuring client code is portable.

Advanced patterns include provider hints (prefer Claude, fallback to GPT-4), custom retry policies (retry 3 times on timeout), and per-user cost caps (prevent runaway spending). OpenRouter's documentation and community Discord are active; issues are resolved quickly. For teams moving from direct OpenAI/Anthropic APIs to OpenRouter, the migration is low-friction because the client code is nearly identical. Monitoring dashboards show per-provider costs and success rates in real time.

OpenRouter use cases and adoption: Common use cases include: (1) cost-sensitive applications (use cheaper models by default, escalate to GPT-4 only when necessary), (2) high-reliability applications (failover if one provider is down), (3) multi-modal applications (route images to Claude, text to GPT-4, code to Codex), and (4) experimental applications (test multiple models on the same data). Startups often adopt OpenRouter to avoid vendor lock-in; if they later want to switch providers, the code changes are minimal.

Monitoring and debugging: OpenRouter provides usage logs (via dashboard or API) showing per-model costs, latencies, and error rates. Set up cost alerts if spending exceeds a threshold. Track per-endpoint error rates (some models fail more often than others on specific tasks) and adjust routing policies accordingly. Use OpenRouter's fallback_models feature to ensure critical requests always complete, even if the primary provider is down.

One pitfall: assuming all models understand the same parameters. For example, `top_k` means different things in different implementations. OpenRouter normalizes these to a standard set, but knowing the mapping helps when debugging unexpected model behavior. Documentation is thorough, and the support team is responsive.

Cost Optimization & Model Selection

OpenRouter provides detailed cost comparison tools to help optimize inference expenses across different model providers and versions. Understanding pricing structures—per-token vs. per-request models, batch discounts, and regional pricing variations—enables informed decisions about cost-performance trade-offs. The platform facilitates A/B testing different models to find the best balance between quality and cost for your specific use case. Real-time pricing information helps you select the most economical option without sacrificing model quality.

Beyond simple cost comparison, OpenRouter's APIs support sophisticated routing strategies. Implementing fallback chains that gracefully degrade when preferred models are unavailable maintains application reliability. Weight-based routing enables gradually rolling out new models while monitoring quality metrics, supporting production-grade deployments at scale.

import openrouter

# Cost-optimized model selection
response = openrouter.ChatCompletion.create(
    model="openrouter/auto",  # Automatic model selection
    routing_preference="cost",  # Prioritize cheaper models
    messages=[
        {"role": "user", "content": "Your query here"}
    ]
)

# Manual cost optimization
models_by_cost = openrouter.list_models(
    sort_by="cost",
    capability="instruct"
)

# Use cheapest suitable model
selected_model = models_by_cost[0]

Optimizing inference costs becomes increasingly important as applications scale. OpenRouter addresses this challenge by aggregating multiple providers and offering transparent pricing, enabling data-driven decisions about model selection and routing strategies that balance cost and quality effectively.

The API standardization across multiple providers is a major advantage of platforms like OpenRouter. Whether using Claude, GPT-4, Llama, or Mistral models, the API remains consistent. This consistency enables straightforward swapping between models to evaluate different options or handle provider-specific issues. Applications built against OpenRouter APIs port easily to any underlying model, providing valuable flexibility and reducing vendor lock-in concerns.

Monitoring and analytics features help teams understand usage patterns and optimize costs. Detailed logging of API calls, token usage, and latency measurements enables data-driven decisions about cost optimization. Identifying models that perform well for your use cases while minimizing cost becomes tractable with comprehensive analytics.

The reliability layer in OpenRouter includes automatic retries, fallback strategies, and rate limiting protection. Rather than handling these concerns in application code, OpenRouter manages them transparently. This abstraction layer improves application reliability while reducing operational complexity. Teams can focus on application logic rather than low-level API resilience patterns.

OpenRouter continues evolving to support new models and capabilities as they emerge. The platform acts as a rapid adoption mechanism for cutting-edge models, making latest capabilities immediately available to applications. This forward-looking design keeps applications current without major refactoring, supporting continuous improvement as the AI landscape evolves.