Claude Code Router: Provider Options, Limits, and Production Routing Setup
This is not a question about whether Claude Code is good. It is a question about how your team operates Claude Code at scale: managing costs, handling rate limits, surviving provider outages, and keeping multiple coding agents running without stepping on each other's quota.
TL;DR
- Direct Anthropic gives you the closest-to-source experience but ties you to a single provider's limits and pricing.
- OpenRouter gives you provider diversity but introduces its own error layer and cost visibility challenges.
- A unified API gateway (like EvoLink) gives Claude Code an Anthropic-compatible endpoint with multi-provider fallback at the gateway level.
- The right choice depends on your team size, workload burstiness, cost sensitivity, and fallback requirements.
- Use the routing option matrix below to match your situation.
Why coding agents need more than a single provider
A single developer using Claude Code through the Anthropic API rarely hits problems. But coding agent workloads at team scale behave differently:
| Team pattern | What happens | Why single-provider breaks |
|---|---|---|
| 3β5 developers, all on Claude Code | Concurrent long-context sessions compete for the same org quota | One developer's large refactoring task can starve others |
| CI/CD pipelines using Claude | Burst traffic during deployments and PR reviews | Short burst can hit RPM/TPM limits while monthly usage looks fine |
| Multi-agent orchestration | Tool fanout, retries, and background tasks stack | Cumulative token usage far exceeds what simple chat would generate |
| Mixed model needs | Some tasks need Opus, some need Sonnet, some need a cheaper option | Single-provider lock-in means over-paying or under-serving some tasks |
If any of these patterns match your team, the question is not "should I use a router?" β it is "which routing approach fits my workload?"
Provider options and tradeoffs
Option 1: Direct Anthropic API
{
"env": {
"ANTHROPIC_API_KEY": "sk-ant-..."
},
"permissions": {
"allow": [],
"deny": []
}
}- Direct access to Claude models with no intermediary
- Anthropic's official rate limits and pricing
- Simplest setup β no extra vendor in the path
- No automatic fallback if Anthropic is down or rate-limiting
- Org-level rate limits shared across all your developers
- No model-switching without code changes
- No cost optimization beyond Anthropic's pricing tiers
Option 2: OpenRouter
Claude Code connects to OpenRouter via environment variables that override the default Anthropic endpoint. OpenRouter exposes an Anthropic Messages APIβcompatible "skin," not a standard OpenAI chat completions endpoint:
{
"env": {
"ANTHROPIC_BASE_URL": "https://openrouter.ai/api",
"ANTHROPIC_AUTH_TOKEN": "sk-or-...",
"ANTHROPIC_API_KEY": ""
},
"permissions": {
"allow": [],
"deny": []
}
}- Access to Claude plus other models through one API
- OpenRouter's provider routing with
allow_fallbacksenabled by default - Broad model catalog if you want to experiment
- An additional error layer: OpenRouter's own errors sit on top of upstream provider errors
- Credit purchase and platform fees can affect effective cost β OpenRouter does not mark up provider inference pricing, but platform fees apply on credit purchases and BYOK overages
- For free models, OpenRouter enforces its own rate limits (20 RPM, 50β1000 requests/day); for paid models, upstream provider limits are usually the main constraint
Option 3: Anthropic-compatible gateway (EvoLink)
ANTHROPIC_BASE_URL to point at EvoLink's Anthropic-compatible proxy endpoint:{
"env": {
"ANTHROPIC_AUTH_TOKEN": "your-evolink-api-key",
"ANTHROPIC_BASE_URL": "https://direct.evolink.ai",
"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1"
},
"permissions": {
"allow": [],
"deny": []
}
}- Anthropic-compatible interface β Claude Code sends standard Anthropic Messages API requests, and EvoLink proxies them with gateway-level routing
- Fallback and model selection handled at the infrastructure level
- One API key for text, image, and video models
- Cost routing designed to reduce effective spend
- Another vendor in the request path (like any gateway)
- Need to verify that specific Claude models are available through EvoLink's catalog
Claude Code routing option matrix
| Factor | Direct Anthropic | OpenRouter | EvoLink (Unified Gateway) |
|---|---|---|---|
| Setup complexity | Low β just an API key | Low β env vars (ANTHROPIC_BASE_URL + token) | Low β env vars (ANTHROPIC_BASE_URL + key) |
| Model access | Claude only | Claude + many others | Claude + 40+ models |
| Rate limit scope | Anthropic org limits | Upstream provider limits (paid models); OpenRouter platform limits (free models) | Gateway-managed limits |
| Fallback on failure | None β you build it | Provider-level fallback (allow_fallbacks=true by default) | Gateway-level automatic fallback |
| Cost visibility | Direct Anthropic billing | Credit/platform fees on top of provider pricing | Per-key usage tracking |
| Error complexity | Single layer | Two layers (OpenRouter + provider) | Two layers (gateway + provider) |
| Multi-model routing | Manual code changes | openrouter/auto or explicit model | evolink/auto or explicit model |
| API compatibility | Native Anthropic SDK | Anthropic Messages APIβcompatible ("Anthropic skin") | Anthropic-compatible proxy |
| Best for | Solo / small team, Claude-only | Model experimentation, broad catalog | Production routing, cost optimization |
Common limits to plan for
Regardless of which provider you choose, coding agent workloads hit these limits:
Quota and rate limits
| Limit type | What triggers it | Impact on coding agents |
|---|---|---|
| RPM (Requests per Minute) | Too many requests in a short window | Parallel tool calls and multi-agent setups hit this fast |
| TPM (Tokens per Minute) | Large context or long outputs | One big refactoring prompt can consume minutes of budget |
| Daily limits | Sustained high usage | CI/CD pipelines can exhaust daily quota by afternoon |
| Org-level sharing | Multiple developers on same org | One person's burst blocks everyone else |
Context window pressure
Current Claude models support up to 1M token context windows (older routes may still expose 200K). Large inputs mean:
- Higher cost per request
- Longer response time
- Greater chance of hitting TPM limits
Provider errors
When errors happen, the source matters:
- Direct Anthropic errors are straightforward to diagnose
- OpenRouter errors can be from OpenRouter itself or the upstream provider β learn to distinguish them
- Gateway errors follow the same pattern β check whether the gateway or the upstream provider returned the error
Production setup checklist
Before routing Claude Code through any provider, verify:
- API key works β send a minimal test request before configuring Claude Code
- Model ID is correct β model naming varies by provider
- Rate limits are known β check your tier's RPM/TPM/daily limits
- Cost is estimated β calculate expected daily spend based on team size and workload
- Fallback plan exists β what happens when the primary provider is down?
- Multiple developers coordinated β if sharing an org/project, plan for quota contention
- Monitoring in place β log request counts, token usage, error rates, and latency
- Timeout configured β coding agent requests can be long; ensure your client timeout matches
When EvoLink-style routing helps
You do not need a routing gateway if:
- You are a solo developer with predictable Claude usage
- You only need one model family
- You already have your own retry and fallback logic
You benefit from gateway routing when:
- Your team runs 3+ concurrent coding agent sessions
- You want to mix Claude, GPT, DeepSeek, or Qwen models by task type
- You want fallback to happen at the infrastructure level, not in your application code
- You care about cost optimization across providers
curl https://api.evolink.ai/v1/chat/completions \
-H "Authorization: Bearer $EVOLINK_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "evolink/auto",
"messages": [
{"role": "user", "content": "Refactor this module to use dependency injection."}
]
}'Related articles
- Claude Code with OpenRouter: Limits, Errors, and Alternatives β detailed OpenRouter comparison for coding agents
- One Gateway for 3 Coding CLIs β setup Gemini CLI, Codex CLI, and Claude Code through one gateway
- Fix OpenRouter 429 "Provider Returned Error" β debug OpenRouter-specific errors
- Model Not Found in OpenAI-Compatible APIs β fix model ID mismatches when switching providers
- How to Reduce 429 Errors in Agent Workloads β throttling and retry patterns for agent traffic
FAQ
What is a Claude Code router?
ANTHROPIC_BASE_URL to point at a different Anthropic-compatible endpoint, or as complete as a unified API gateway that handles provider selection, fallback, and cost routing automatically.Can I use Claude Code with a non-Anthropic provider?
ANTHROPIC_BASE_URL to override the default Anthropic endpoint. Any service that exposes an Anthropic Messages APIβcompatible endpoint can serve as a proxy β including OpenRouter (which provides an "Anthropic skin"), EvoLink, and self-hosted solutions. This is not the same as a generic OpenAI-compatible endpoint; Claude Code expects the Anthropic API format.Will routing add latency to my coding agent?
Any additional hop adds some latency. For most coding agent workloads, the added latency from a gateway (typically 10β50ms) is negligible compared to model inference time (often seconds). The tradeoff is latency vs. fallback and cost benefits.
How do I handle rate limits across a team?
Three approaches: (1) use separate API keys per developer to isolate quota, (2) implement client-side throttling in your coding agent workflows, (3) use a gateway that manages rate limits at the infrastructure level.
Should I use evolink/auto or a specific model for coding?
claude-sonnet-4-20250514) when you need predictable behavior for a tested workflow. Use evolink/auto when you want the router to optimize for cost-quality tradeoffs across mixed coding tasks.