VOOZH about

URL: https://www.morphllm.com/openrouter-alternative

⇱ OpenRouter Alternatives (2026): 10 Providers Compared on Price, Rate Limits, Compliance


OpenRouter Alternatives (2026): 10 Providers Compared on Price, Rate Limits, and Compliance

OpenRouter passes through provider rates with a 5.5% credit fee. Go direct and you skip the fee, but lose unified billing. We compare DeepInfra, Fireworks, Together, Groq, Baseten, Modal, Replicate, and LiteLLM on exact token prices, GPU $/hr, rate limits, and SOC 2 / HIPAA status. DeepSeek V4 Pro runs $1.30 to $2.10 per M input depending on provider.

June 9, 2026 · 1 min read

OpenRouter is a router, not an inference provider. It passes provider token rates through with no markup and charges 5.5% on credit purchases. The cheapest alternative is usually the provider OpenRouter is already routing you to, accessed directly. DeepSeek V4 Pro costs $1.30 per M input on DeepInfra direct, $1.74 on Fireworks, $2.10 on Together. Same model, same weights. Below: exact prices, rate limits, GPU rates, and compliance for the 10 providers OpenRouter routes through.

5.5%
OpenRouter credit purchase fee
$1.30
DeepSeek V4 Pro /M input (DeepInfra, cheapest)
$1.79/hr
Cheapest dedicated H100 (DeepInfra)
500 tok/s
GPT-OSS 120B throughput (Groq)

What OpenRouter Actually Charges

OpenRouter does not mark up inference. You pay the same token rate you would pay the underlying provider. The money OpenRouter makes is the 5.5% fee on credit purchases (and a 5% fee on bring-your-own-key requests after the first 1M free per month). At scale on the homepage: 400+ active models, 60+ providers, 100T monthly tokens, 8M+ users, 250k+ apps.

The free tier is tight. Without credits you get 50 requests per day. After buying at least $10 in credits, 1,000 requests per day. Free model variants (the :free suffix) are capped at 20 requests per minute. Unused credits can expire one year after purchase. None of this is a markup, but it is the cost of the convenience layer.

So the decision is not "is OpenRouter overpriced?" It is: do you need unified billing and automatic fallback across 60+ providers, or do you want the lowest direct price and full control? If the latter, you have two moves: go direct to a single cheaper provider, or self-host LiteLLM to keep the unified API with no per-request fee.

Pass-through pricing

No inference markup. 5.5% fee on credit purchases ($0.80 min). BYOK: first 1M requests/month free, then 5% of normal model cost.

Zero logging by default

No prompt or completion logging, even on errors, unless you opt in. Opting into logging earns a 1% usage discount.

Free tier limits

50 requests/day without credits, 1,000/day after $10 in credits. Free :free models capped at 20 requests/minute.

Same Model, Five Prices

OpenRouter routes the same open model to multiple providers, each with its own direct rate. Going direct to the cheapest one is the largest lever, larger than the 5.5% credit fee. Prices below are per 1M tokens, input / output, from each provider's pricing page on June 9, 2026.

ProviderInputOutput
DeepInfra$1.30$2.60
Novita$1.60$3.20
Baseten$1.74$3.48
Fireworks (V4 Pro)$1.74$3.48
Together$2.10$4.40
ProviderInputOutput
DeepInfra$0.75$3.50
Novita$0.80$3.40
Baseten$0.95$4.00
Fireworks$0.95$4.00
Together$1.20$4.50
ProviderInputOutput
DeepInfra$1.05$3.50
Baseten$1.30$4.30
Novita$1.38$4.40
Fireworks$1.40$4.40
Together$1.40$4.40
ProviderInputOutputNotes
Baseten$0.10$0.50cheapest
Fireworks$0.15$0.60$0.015 cached input
Together$0.15$0.60-
Groq$0.15$0.60500 tok/s

Cached input is the second lever

Most providers discount cached input tokens heavily. Fireworks and Groq give 50% off cached input. DeepInfra prices DeepSeek V4 Pro cached input at $0.10 (vs $1.30 uncached). For agent loops with a large fixed system prompt, cached pricing can matter more than the headline rate. Always check the cached column for your workload.

Provider-by-Provider

The serverless providers OpenRouter routes to fall into three groups: per-token open-model shops (DeepInfra, Fireworks, Together, Novita), speed specialists (Groq, Cerebras), and run-your-own-code platforms that bill by GPU-second (Modal, Replicate, Baseten).

DeepInfra: cheapest per-token, no free tier

DeepInfra is the price leader on open models: DeepSeek V4 Pro $1.30/$2.60, V4 Flash $0.10/$0.20, Kimi K2.6 $0.75/$3.50, GLM-5.1 $1.05/$3.50, Qwen3-235B-A22B-Instruct $0.09/$0.10, Llama 3.1 8B $0.02/$0.05. It also resells Anthropic (Sonnet 4.6 at $3/$15, Opus 4.8 at $5/$25). Limit is 200 concurrent requests per account, postpaid with no free tier and no minimum. SOC 2 and ISO 27001 certified, zero retention on inference (only metadata logged), with HIPAA/GDPR measures. See the Fireworks vs DeepInfra breakdown.

Fireworks: predictable rate limits, fine-tuning

Fireworks publishes clear ceilings: 6,000 RPM with a payment method (10 RPM without), gated by monthly spend tiers ($50, $500, $5,000, $50,000). Serverless covers Kimi K2.6 $0.95/$4.00, DeepSeek V4 Flash $0.14/$0.28, GLM 5.1 $1.40/$4.40, GPT-OSS 120B $0.15/$0.60, MiniMax 2.7 $0.30/$1.20. Cached input gets 50% off and batch runs at 50% of serverless. Fine-tuning is priced per 1M training tokens, from $0.50 (LoRA SFT, up to 16B) to $20 (full SFT, >300B). SOC 2 Type II, HIPAA, zero data retention on open models. Fireworks alternatives.

Together: serverless plus GPU clusters

Together spans serverless (DeepSeek V4 Pro $2.10/$4.40, GLM-5.1 $1.40/$4.40, Qwen3.5-397B-A17B $0.60/$3.60, GPT-OSS-120B $0.15/$0.60, Llama 3.3 70B $1.04/$1.04) and rentable GPU clusters (HGX H100 $5.49/hr on-demand, reserved from $3.99/hr). Rate limits are dynamic per model and scale with sustained traffic; for a fixed guarantee Together points you to a dedicated endpoint. Batch API gives up to 50% off on selected models. SOC 2 Type 2. Together AI alternatives.

Groq: throughput specialist

Groq optimizes for tokens per second on a fixed catalog: GPT-OSS 120B at 500 tok/s ($0.15/$0.60), GPT-OSS 20B at 1,000 tok/s ($0.075/$0.30), Llama 3.3 70B at 394 tok/s ($0.59/$0.79), Qwen3 32B at 662 tok/s ($0.29/$0.59), Kimi K2 Instruct $1.00/$3.00 (cached input $0.50). Free plan exists (e.g. 30 RPM, 14.4K requests/day on llama-3.1-8b-instant). Batch is 50% off with a 24h-7day window. SOC 2 Type II; HIPAA via BAA with preview/beta and compound systems excluded. Compare Fireworks vs Groq.

Modal, Replicate, Baseten: run your own model

These bill for compute, not tokens, which fits custom or fine-tuned models OpenRouter does not host. Modal bills per second (H100 ~$3.95/hr, B200 ~$6.25/hr), boots containers in ~1s, scales to zero, and gives $30/month free credits on Starter. Replicate scales to zero by default and only charges running prediction time (H100 $5.49/hr, A100 80GB $5.04/hr). Baseten bills per minute (H100 $6.50/hr dedicated, B200 $9.98/hr) and is SOC 2 Type II + HIPAA, but cold starts can take minutes, so its docs recommend min_replica >= 2 for production. See Baseten alternatives and Baseten vs Modal.

GPU Pricing for Dedicated Inference

When per-token serverless does not fit (custom model, sustained high throughput, or a fixed latency SLA), you rent GPUs directly. OpenRouter cannot do this; it only routes API calls. Published on-demand rates for an H100 80GB span almost 4x:

ProviderRateBilling model
DeepInfra$1.79/hrdedicated
Modal~$3.95/hrper-second ($0.001097/s)
Replicate$5.49/hrper-second, running time only
Together (cluster)$5.49/hrGPU cluster on-demand
Together (endpoint)$6.49/hrdedicated endpoint
Baseten$6.50/hrper-minute, dedicated
Fireworks$7.00/hron-demand
ProviderRateNotes
DeepInfra$2.79/hrcheapest published
Modal~$6.25/hr$0.001736/s
Baseten$9.98/hrper-minute
Together (cluster)$9.95/hr$11.95/hr dedicated endpoint
Fireworks$10.00/hr-

Replicate publishes no B200 or H200 rate. Modal, Replicate, and Baseten all scale to zero, but cold starts differ sharply: Modal boots in ~1 second, while Baseten and Replicate cold boots for large models can take minutes. If you cannot tolerate a cold start, keep min replicas warm (which charges around the clock) or pick Modal's snapshotting.

Rate Limits Compared

OpenRouter's free models cap at 20 requests per minute and accounts without credits at 50 requests per day. Going direct usually buys higher, clearer limits.

ProviderLimitHow it scales
OpenRouter (free)20 RPM / 50 RPD1,000 RPD after $10 in credits
Fireworks6,000 RPM ceiling10 RPM without a card; spend tiers $50-$50,000/mo
DeepInfra200 concurrentper account, postpaid
TogetherDynamic per-modelscales with sustained traffic; dedicated endpoint for a fixed limit
Groq (free)30 RPM / 14.4K RPDexample on llama-3.1-8b-instant; Developer plan is higher

Fireworks is the most predictable: add a card and you get a flat 6,000 RPM ceiling, with the monthly budget (not the request rate) gated by spend tiers. Together's dynamic model means you cannot quote a fixed number; if your contract needs one, you provision a dedicated endpoint.

Compliance: SOC 2, HIPAA, Zero Retention

OpenRouter is a third-party router: prompts and completions transit its infrastructure (it does not log them by default). For SOC 2, HIPAA, or data residency, going direct to a certified provider, or self-hosting LiteLLM, removes the extra hop.

ProviderSOC 2 Type IIHIPAADefault retention
BasetenYesYesStandard
FireworksYesYesZero (open models)
GroqYesBAA (preview/compound excluded)Standard
TogetherYesContactStandard
ModalYes (Starter tier)EnterpriseStandard
DeepInfraYes (+ ISO 27001)Measures in placeZero (metadata only)
OpenRouter--Zero (no prompt logging)

DeepInfra notes one exception to zero retention: Google models, where Google logs prompts and responses for abuse detection. Fireworks states it does not log or store prompt or generation data for open models without explicit opt-in, with TLS 1.2+ in transit and AES-256 at rest.

LiteLLM vs OpenRouter

The query "litellm vs openrouter" is the second-most-searched phrase that lands on this page. Both give a unified OpenAI-compatible API across providers. The split is hosted vs self-hosted.

OpenRouterLiteLLM
DeploymentHosted routerSelf-hosted (MIT)
Per-request fee5.5% on credit purchasesNone
Keys & trafficTransit OpenRouterStay in your infrastructure
Provider catalog400+ models, 60+ providers100+ providers (you configure)
Automatic fallbackBuilt inConfigurable retry/load balancing
Ops burdenNoneYou deploy, scale, monitor

LiteLLM is the right call when compliance or cost rules out a third-party hop: it is MIT-licensed, runs in your network, and adds no fee. The cost is operational, you run the proxy. OpenRouter wins when you want zero infrastructure and the broadest catalog with automatic fallback, and the 5.5% credit fee is acceptable. They are not mutually exclusive: many teams prototype on OpenRouter, then move hot paths to direct providers behind LiteLLM once volume justifies the ops.

Which Alternative to Pick

Your reasonBest alternativeWhy
Lowest per-token price (open models)DeepInfra (direct)DeepSeek V4 Pro $1.30/M, no markup, no free-tier limits
Fastest outputGroqGPT-OSS 120B at 500 tok/s, 20B at 1,000 tok/s
Keep unified API, drop the feeLiteLLM (self-host)MIT-licensed, no per-request fee, keys stay on-prem
Predictable rate limitsFireworksFlat 6,000 RPM ceiling with a card
SOC 2 + HIPAA + zero retentionFireworks or DeepInfraBoth certified; both zero-retention by default
Run a custom / fine-tuned modelModal or BasetenGPU-second billing, scale to zero, your own weights
Cheapest dedicated GPUsDeepInfraH100 $1.79/hr, B200 $2.79/hr
Highest-fidelity DeepSeek + coding agentsMorph16-bit (bf16) activations, no fp8; codegen spec decoding + kernels
Broadest catalog, zero opsStay on OpenRouter400+ models, 60+ providers, automatic fallback

Going direct to DeepInfra (OpenAI-compatible, no 5.5% fee)

import OpenAI from 'openai'

// DeepInfra speaks the OpenAI API. Same code as OpenRouter,
// different base URL, and you skip the 5.5% credit fee.
const client = new OpenAI({
 apiKey: process.env.DEEPINFRA_API_KEY,
 baseURL: 'https://api.deepinfra.com/v1/openai',
})

const res = await client.chat.completions.create({
 model: 'deepseek-ai/DeepSeek-V4-Pro', // $1.30/M input direct
 messages: [{ role: 'user', content: userQuery }],
})

If you want the unified-API ergonomics of OpenRouter without the fee, point a self-hosted LiteLLM proxy at the same direct providers. You keep one OpenAI-compatible endpoint, your keys never leave your network, and there is no per-request charge. For a deeper look at the routing layer itself, see what an LLM gateway is and how automatic model routing works.

One axis the price tables miss is output fidelity. Most serverless providers quantize activations to fp8 to cut cost, which moves the output away from the reference weights. Morph serves DeepSeek with 16-bit (bf16) activations and does not quantize activations to fp8, so the output matches the reference model. That makes Morph the best place to run DeepSeek when fidelity matters, not just the cheapest token rate. morph-dsv4flash (DeepSeek V4 Flash) is $0.139/M input and $0.278/M output.

For coding agents specifically, Morph runs codegen-tuned speculative decoding (draft and ngram tuned on code) plus custom low-level inference kernels built for code generation, which makes it the fastest and highest-quality option for that workload. See Morph Open Source Models and pricing.

Frequently Asked Questions

What is the best OpenRouter alternative in 2026?

It depends on why you are leaving. For the lowest per-token price on open models, go direct to DeepInfra (DeepSeek V4 Pro at $1.30/M input vs $1.74-$2.10 elsewhere). For the fastest output, Groq runs GPT-OSS 120B at 500 tok/s. To keep OpenRouter's unified API without the 5.5% credit fee, self-host LiteLLM. For dedicated GPUs, DeepInfra is cheapest at $1.79/hr for an H100.

How much does OpenRouter charge on top of provider rates?

No markup on inference. The fee is 5.5% on credit purchases. For bring-your-own-key routing, the first 1M BYOK requests per month are free, then 5% of what the model would normally cost. Free model variants (the :free suffix) cap at 20 requests per minute, and accounts without credits get 50 requests per day (1,000/day after buying $10 in credits).

Is OpenRouter cheaper than going direct?

No. OpenRouter passes through the provider rate and adds 5.5% on credit purchases, so direct is at least 5.5% cheaper for the same provider. The larger savings come from picking a cheaper provider for the same model: DeepSeek V4 Pro is $1.30/M input on DeepInfra vs $2.10 on Together. OpenRouter's value is unified billing and fallback across 60+ providers, not price.

What is the difference between OpenRouter and LiteLLM?

OpenRouter is a hosted router (5.5% credit fee, keys transit their infrastructure). LiteLLM is an MIT-licensed proxy you self-host (no fee, keys stay on-prem). Both expose a unified OpenAI-compatible API. LiteLLM is the choice for SOC 2 / HIPAA / data residency; OpenRouter is the choice for zero infrastructure and the broadest catalog.

Which OpenRouter alternative has a free tier?

Fireworks gives new accounts $1 in credits, Modal gives $30/month on Starter, Groq has a free plan (30 RPM, 14.4K requests/day on llama-3.1-8b-instant), Together and Baseten offer free experimentation credits, and Cerebras has a free tier across all models. DeepInfra has no free tier but is postpaid with no minimum. LiteLLM is free to self-host.

What is the cheapest provider for DeepSeek and Kimi?

DeepInfra. DeepSeek V4 Pro is $1.30/$2.60 vs $1.74 on Fireworks/Baseten and $2.10 on Together. DeepSeek V4 Flash is $0.10/$0.20. Kimi K2.6 is $0.75/$3.50 vs $0.80 on Novita, $0.95 on Baseten and Fireworks, and $1.20 on Together.

Which OpenRouter alternatives are SOC 2 and HIPAA compliant?

SOC 2 Type II: Baseten, Fireworks, Groq, Together, Modal (Starter tier), DeepInfra (+ ISO 27001). HIPAA: Baseten, Fireworks, and Modal (Enterprise); Groq offers a BAA with preview/beta and compound systems excluded. Zero retention by default: Fireworks (open models), DeepInfra, and OpenRouter itself.

Which provider has the fewest rate limit headaches?

Fireworks publishes a fixed 6,000 RPM ceiling with a payment method (10 RPM without), gated by monthly spend tiers from $50 to $50,000. DeepInfra allows 200 concurrent requests per account. Together uses dynamic per-model limits and recommends a dedicated endpoint for a guaranteed fixed limit. OpenRouter free models cap at 20 RPM.

Related Resources

Searching is where coding agents waste tokens

WarpGrep is an agentic code search API: $0 for 100k requests, $1 per 1M after. It runs 8 parallel tool calls per turn to find the right files in a large repo so your model spends tokens on the answer, not the search. Works inside any agent that supports MCP.