Quick Answer
Last updated June 2026
TL;DR
Set a custom model in Claude Code with four environment variables. Point ANTHROPIC_BASE_URL at any gateway that speaks the Anthropic Messages API (/v1/messages), set the token, then name the main model and the background model. Claude Code routes every request to that gateway instead of api.anthropic.com.
export ANTHROPIC_BASE_URL="https://openrouter.ai/api" # any Anthropic-Messages gateway
export ANTHROPIC_AUTH_TOKEN="sk-or-v1-your-key" # the gateway key
export ANTHROPIC_MODEL="anthropic/claude-sonnet-4.5" # main model
export ANTHROPIC_SMALL_FAST_MODEL="anthropic/claude-haiku-4.5" # background tasks
export ANTHROPIC_API_KEY="" # keep empty so Claude Code does not fall back to Anthropic authOpenRouter (200+ models), Z.AI, and DeepSeek expose the Anthropic Messages format natively, so no proxy is needed. For OpenAI-format, self-hosted, or local models like Ollama, run a LiteLLM gateway: Claude Code talks to LiteLLM, LiteLLM forwards to the upstream provider.
Set a Custom Model in Claude Code
Claude Code hardcodes three model names internally: sonnet, opus, and haiku. To run a custom or open-weight model, you remap those names through ANTHROPIC_MODEL (the main model) and ANTHROPIC_SMALL_FAST_MODEL (background tasks like summarization and title generation). Both point at names your gateway exposes. GitHub issue 12969 tracks making the model ID fully configurable, including AWS China regions and fine-tuned models.
Custom model override
# Remap the hardcoded names to whatever the gateway exposes
export ANTHROPIC_MODEL="deepseek-v4-pro" # used wherever Claude Code asks for sonnet/opus
export ANTHROPIC_SMALL_FAST_MODEL="deepseek-v4-flash" # used wherever it asks for haiku
claudeSetting ANTHROPIC_SMALL_FAST_MODEL matters more than people expect. Claude Code fires the small-fast model constantly for non-coding chores. Pointing it at a cheap tier keeps those off your expensive model. Leave it unset and Claude Code falls back to a Claude haiku alias your gateway may not have.
Pick a Method: Provider Decision Table
The right method depends on whether your provider already speaks the Anthropic Messages API. If it does, point Claude Code straight at it. If it speaks OpenAI format or runs locally, put a LiteLLM gateway in front.
| Method | Auth | Env vars | /v1/messages native? | Models | Cost lever | Best for |
|---|---|---|---|---|---|---|
| Direct OpenRouter | OpenRouter key | BASE_URL + AUTH_TOKEN + MODEL | Yes (Anthropic skin) | 200+ | Volume aggregation | Fastest switch, widest catalog |
| LiteLLM gateway | Gateway key (any value local) | All four + model_list | Yes (LiteLLM translates) | Any provider | Routing + fallbacks + caps | OpenAI-format, self-hosted, mixed |
| Claude Code Router | Per-provider keys | Config file, no BASE_URL | Yes (router serves it) | Many | Per-task routing rules | Rule-based provider switching |
| Morph open models via gateway | Morph key | All four (via LiteLLM) | No (OpenAI-only upstream) | DeepSeek V4, Qwen, MiniMax | bf16 + cheap open weights | Cheapest high-quality open models |
Morph serves open models on an OpenAI-compatible endpoint at https://api.morphllm.com/v1 only. There is no /v1/messages and no Anthropic path, so to use a Morph-served model inside Claude Code you forward to it through a LiteLLM gateway: LiteLLM is the Anthropic-compatible shim, Morph is the upstream model. morph-dsv4flash (DeepSeek V4 Flash) is $0.139/M input and $0.278/M output, about 21x cheaper than Claude Sonnet's $3/M input.
Why Developers Switch to Different Models
Claude Code ships locked to Anthropic's API. For most people, that's fine. But you'll want an escape hatch when:
- Rate limits hit hard. Anthropic's per-minute caps are real. During crunch time, you'll hit them. One team we talked to started routing overflow to GPT-5.5 just to keep shipping.
- Budget constraints. Claude Sonnet runs about $3/M input tokens. DeepSeek V4 Flash served by Morph is $0.139/M input, about 21x less. For bulk refactoring or test generation, that gap matters.
- Air-gapped environments. Defense contractors and healthcare orgs often can't send code to external APIs. Period.
- Curious about alternatives. Qwen3-Coder is surprisingly good at Python. Gemini's 1M context handles massive monorepos. You won't know until you try.
A setup we've seen work: GPT-5.5 for architecture discussions and planning, Claude for implementation (still the strongest at tool use), and a local 14B model for quick questions while offline. The gateway layer makes switching trivial.
Fair Warning
Claude Code was purpose-built for Claude models. According to Anthropic's engineering blog, the agent relies heavily on tool calling patterns that Claude handles natively. Other models can struggle with:
- Multi-step tool chains (especially the "read file → think → edit file" loop)
- Generating diffs that actually apply cleanly
- Extended thinking and artifacts (Opus-only features anyway)
GPT-5.5 and Gemini 3.1 Pro are solid alternatives for most workflows.
Method 1: Direct API Configuration
The simplest approach works with providers that offer Anthropic-compatible endpoints. No proxy needed.
Providers with Native Anthropic API Support
| Provider | Base URL | Notes |
|---|---|---|
| OpenRouter | https://openrouter.ai/api | 200+ models, Anthropic skin built-in |
| Z.AI (GLM) | https://api.z.ai/api/anthropic | Chinese models, global access |
| Amazon Bedrock | Via LiteLLM proxy | Enterprise, requires IAM setup |
| Azure OpenAI | Via LiteLLM proxy | Enterprise, requires configuration |
Step-by-Step Setup
1. Get your API key from your chosen provider.
2. Set environment variables in your terminal:
Terminal Configuration
# For OpenRouter
export ANTHROPIC_BASE_URL="https://openrouter.ai/api"
export ANTHROPIC_AUTH_TOKEN="sk-or-v1-your-key"
# Prevent Claude Code from trying Anthropic auth
export ANTHROPIC_API_KEY=""3. Make it permanent by adding to your shell profile:
Shell Profile (~/.zshrc)
# Add to ~/.bashrc, ~/.zshrc, or ~/.config/fish/config.fish
export ANTHROPIC_BASE_URL="https://openrouter.ai/api"
export ANTHROPIC_AUTH_TOKEN="sk-or-v1-your-key"
export ANTHROPIC_API_KEY=""4. Verify the connection:
# Restart your terminal, then run Claude Code
claude
# Inside Claude Code, check status
/statusCommon Mistake
Don't put these in a project .env file. Claude Code doesn't read standard .env files. Use your shell profile or set them in the terminal directly.
Method 2: LiteLLM Proxy
LiteLLM acts as a universal translator between Claude Code and any LLM provider. It handles the API format conversion automatically.
Use this method when:
- Your provider only offers OpenAI-format APIs
- You want to route between multiple providers with fallbacks
- You need usage tracking and cost controls
- You're connecting to self-hosted models
Quick Setup
LiteLLM Installation
# Install LiteLLM
pip install litellm
# Create config file
cat > litellm_config.yaml << 'EOF'
model_list:
- model_name: claude-sonnet-4-5-20250929
litellm_params:
model: openrouter/anthropic/claude-sonnet-4.5
api_key: sk-or-v1-your-key
- model_name: gpt-5.5
litellm_params:
model: openai/gpt-5.5
api_key: sk-your-openai-key
- model_name: morph-dsv4flash
litellm_params:
model: openai/morph-dsv4flash
api_base: https://api.morphllm.com/v1
api_key: sk-your-morph-key
EOF
# Start the proxy
litellm --config litellm_config.yaml --port 4000Connect Claude Code to LiteLLM
export ANTHROPIC_BASE_URL="http://localhost:4000"
export ANTHROPIC_AUTH_TOKEN="sk-1234" # Any value works locally
export ANTHROPIC_MODEL="claude-sonnet-4-5-20250929" # Or gpt-5.5, morph-dsv4flash
export ANTHROPIC_SMALL_FAST_MODEL="morph-dsv4flash" # cheap model for background tasks
# Now run Claude Code
claudeThe proxy translates requests on the fly. Claude Code thinks it's talking to Anthropic, but your requests go to whatever model you configured.
Shell Aliases for Fast Switching
Quick Model Switching
# Add to ~/.zshrc or ~/.bashrc
alias claude-gpt='ANTHROPIC_MODEL=gpt-5.5 claude'
alias claude-ds='ANTHROPIC_MODEL=morph-dsv4flash claude'
alias claude-local='ANTHROPIC_MODEL=ollama/codellama claude'
# Usage
claude-gpt # Starts Claude Code with GPT-5.5
claude-ds # Starts Claude Code with DeepSeek V4 FlashMethod 3: OpenRouter Direct Connection
OpenRouter provides an "Anthropic skin" that speaks Claude Code's native protocol. No proxy needed, direct connection with access to 200+ models.
Setup
# Get your key from openrouter.ai/keys
export ANTHROPIC_BASE_URL="https://openrouter.ai/api"
export ANTHROPIC_AUTH_TOKEN="sk-or-v1-your-key"
export ANTHROPIC_API_KEY=""
# Start Claude Code
claudeSelecting Models
Use the /model command inside Claude Code to switch between OpenRouter models:
# Inside Claude Code
/model anthropic/claude-sonnet-4.5 # Claude Sonnet
/model openai/gpt-5.5 # GPT-5.5
/model google/gemini-3.1-pro # Gemini 3.1 Pro
/model qwen/qwen3-coder # Qwen3-CoderOpenRouter Pricing
OpenRouter often beats direct API pricing through volume aggregation. Check openrouter.ai/models for current rates. Some models are free for limited use.
Method 4: anyclaude Wrapper
anyclaude is a drop-in wrapper from Coder that makes multi-provider switching dead simple. No proxy server, no config files—just environment variables.
anyclaude Setup
# Install with your package manager
bun add -g anyclaude # or npm, pnpm
# Set your keys
export OPENAI_API_KEY="sk-..."
export GOOGLE_API_KEY="..."
export XAI_API_KEY="..."
# Run with any model using prefixes
anyclaude --model openai/gpt-5.5
anyclaude --model google/gemini-3.1-pro
anyclaude --model xai/grok-4The prefix tells anyclaude which provider to route to. It handles the API translation under the hood. For OpenRouter models, set OPENAI_API_URL to their endpoint.
This is probably the fastest way to test different models without committing to a full proxy setup. The tradeoff: less flexibility than LiteLLM for complex routing rules.
Method 5: Local LLMs with Ollama
Running models locally eliminates API costs and keeps your code private. The tradeoff: you need decent hardware and most local models don't support tool calling well.
Hardware Requirements
| Model Size | VRAM Needed | Example Models |
|---|---|---|
| 7B parameters | 8GB | CodeLlama-7B, Qwen2.5-Coder-7B |
| 14B parameters | 16GB | Qwen2.5-Coder-14B |
| 32B parameters | 24GB | Qwen2.5-Coder-32B |
| 70B+ parameters | 48GB+ | CodeLlama-70B, DeepSeek-Coder-V2 |
Setup with Ollama + LiteLLM
Local Model Setup
# 1. Install and start Ollama
brew install ollama # macOS
ollama serve
# 2. Pull a coding model
ollama pull qwen2.5-coder:14b
# 3. Configure LiteLLM to use Ollama
cat > litellm_config.yaml << 'EOF'
model_list:
- model_name: claude-sonnet-4-5-20250929
litellm_params:
model: ollama/qwen2.5-coder:14b
api_base: http://localhost:11434
EOF
# 4. Start proxy
litellm --config litellm_config.yaml --port 4000
# 5. Connect Claude Code
export ANTHROPIC_BASE_URL="http://localhost:4000"
export ANTHROPIC_AUTH_TOKEN="local"
claudeTool Calling Limitations
Most local models have weak or no tool calling support. This means Claude Code can generate code suggestions but can't automatically read files, run commands, or apply edits. You'll need to copy-paste more manually.
Model Compatibility Matrix
Claude Code's full feature set requires tool calling, large context windows, and reliable instruction following. Here's how popular models stack up:
| Model | Tool Calling | Context | Edit Reliability | Best For |
|---|---|---|---|---|
| Claude Sonnet 4.5 | Excellent | 200K | Highest | Full Claude Code experience |
| GPT-5.5 | Good | 256K | High | Planning, complex reasoning |
| GPT-5.5-mini | Good | 256K | High | Cost-efficient tasks |
| Gemini 3.1 Pro | Good | 1M | Medium-high | Large codebase analysis |
| DeepSeek V4-Pro | Good | 1M | Medium-high | Cheap agentic coding |
| Qwen3-Coder | Limited | 256K | Medium | Code generation |
| Local 7B models | Poor | 8-32K | Low | Quick prototyping only |
"Edit Reliability" is a relative ranking of how often a model's suggested changes apply cleanly without manual intervention, based on hands-on use, not a single published benchmark. Lower reliability means more failed edits and retry loops. Treat the ordering as directional.
Route by Difficulty Instead of Picking One Model
Once Claude Code can talk to any provider, the better setup is not picking a single replacement model. It is routing each turn to the right one. Most requests to a coding agent are routine: renaming a variable, formatting output, writing a boilerplate test, summarizing a diff. Those run identically on a cheap model as on a frontier one. 60-80% of coding-agent requests fall into that bucket, so sending all of them to Opus or GPT-5.5 burns money for no quality gain. For a team, that is the difference between a bill that grows with headcount and one that grows with how hard everyone leans on the agent.
A router classifies each turn by difficulty, then your gateway calls the cheapest model that handles it. Easy work goes to Haiku 4.5 or an open model, hard work to Opus 4.7 or GPT-5.5. You set the policy once instead of switching models by hand mid-session. The classify step is a separate API call: it returns a model name, it does not proxy the LLM request. Your LiteLLM gateway (the one Claude Code points at via ANTHROPIC_BASE_URL) reads that name and forwards the turn to the chosen upstream.
Morph Router exposes POST /v1/router/classify and /v1/router/multimodel. classify returns difficulty, ambiguity, and domain signals; multimodel returns a single recommended model name. Both classify-only: the router does not proxy the LLM request, so you call the returned model yourself (through your gateway, or your own SDK). That keeps the contract and data inside whatever provider you already use. For a team standardizing the setup, see the Claude Code Enterprise writeup on per-seat controls, or the cost optimization guide.
Fixing Common Problems
These come up constantly when troubleshooting alternative model setups. Most are fixable in under a minute.
"Cannot find matching context" / Edits Keep Failing
You ask for a change. The model generates what looks like valid code. Claude Code rejects it. This is the most common issue—GitHub is full of these reports.
The root cause: Claude Code's edit system expects diffs in a specific format. Claude generates them correctly because Anthropic trained it that way. Other models approximate but miss details—wrong line numbers, bad whitespace handling, mismatched context lines.
Quick fixes:
- Ask for smaller edits. "Change the function signature" works better than "refactor this file"
- Say "read the file first" before asking for edits. Forces the model to refresh its view
- Use
/compactbetween major changes to reset context
Proper fix: Use a dedicated code editing layer (like the Morph MCP) that intercepts edit operations and re-applies them with a purpose-built merge model. It takes the raw model diff and applies it cleanly, which removes most of the format-mismatch failures you get from applying alternative-model diffs directly.
Sessions Degrade Over Time
Start fresh, everything works. An hour in, suggestions get worse and latency spikes. Happens faster with non-Claude models.
This is context pollution. Claude Code searches files sequentially—each search loads more content into context, even irrelevant matches. The signal-to-noise ratio drops until the model is reasoning over garbage.
- Run
/compactproactively—every 10-15 turns, not just when prompted - Use
/clearwhen switching to unrelated tasks - Be specific in searches: "authentication error handler in auth.ts" beats "where do we handle errors"
Model Just Talks Instead of Acting
You: "Read the config file and update the timeout."
Model: "I would read the config file and then update the timeout to..."
Dead giveaway that tool calling isn't working. The model can't execute operations, so it describes what it would do.
- Verify your model supports function/tool calling (check OpenRouter's model list)
- GPT-5.5, Gemini 3.1 Pro, Claude: work. Many small local models: don't
- If using local models, accept that you're getting a generation-only experience
Authentication/Connection Errors
# Most common issue: conflicting API key
export ANTHROPIC_API_KEY="" # Must be empty, not unset
# Check your variables are set correctly
echo "Base URL: $ANTHROPIC_BASE_URL"
echo "Auth Token: $ANTHROPIC_AUTH_TOKEN"
# Common mistake: trailing slash breaks it
# Wrong: https://openrouter.ai/api/
# Right: https://openrouter.ai/api
# Restart your terminal after changes
source ~/.zshrc # or ~/.bashrcFrequently Asked Questions
How do I set a custom model in Claude Code?
Set ANTHROPIC_BASE_URL to a gateway that speaks the Anthropic Messages API, ANTHROPIC_AUTH_TOKEN to that gateway's key, ANTHROPIC_MODEL to your main model name, and ANTHROPIC_SMALL_FAST_MODEL to a cheaper background model. Claude Code hardcodes the names sonnet, opus, and haiku; these two model variables remap them to whatever the gateway exposes, including custom or open-weight models. GitHub issue 12969 tracks making model IDs fully configurable.
What is ANTHROPIC_AUTH_TOKEN in Claude Code?
ANTHROPIC_AUTH_TOKEN is the bearer token Claude Code sends to whatever endpoint ANTHROPIC_BASE_URL points at. Against api.anthropic.com it is your Anthropic key; against a gateway it is that gateway's key (an OpenRouter key, or any string for a local LiteLLM proxy). Set ANTHROPIC_AUTH_TOKEN, not ANTHROPIC_API_KEY, when using a third-party base URL, and leave ANTHROPIC_API_KEY empty so Claude Code does not fall back to Anthropic auth.
How do I use Claude Code with an LLM gateway?
Run a gateway that exposes the Anthropic Messages API (/v1/messages), then set ANTHROPIC_BASE_URL to it. OpenRouter, Z.AI, and DeepSeek expose this format natively. For OpenAI-format or self-hosted models, run LiteLLM as the gateway: it translates the Anthropic Messages API into the upstream format and forwards the request. Claude Code only ever talks to the gateway.
Does Claude Code work with LiteLLM?
Yes. LiteLLM is the standard Anthropic-Messages-to-anything translation layer for Claude Code. Define a model_list mapping a model name (for example claude-sonnet-4-5) to an upstream provider, start the proxy, then set ANTHROPIC_BASE_URL to http://localhost:4000. This is also how you reach OpenAI-format providers, Bedrock, Azure, Ollama, and Morph-served open models, since none of them expose /v1/messages directly.
What is the Claude Code LLM proxy setup?
Install LiteLLM, write a litellm_config.yaml model_list, start it with litellm --config litellm_config.yaml --port 4000, then export ANTHROPIC_BASE_URL=http://localhost:4000, ANTHROPIC_AUTH_TOKEN to any value, and ANTHROPIC_MODEL to a name in your model_list. The proxy receives Anthropic Messages requests from Claude Code and forwards them to the configured upstream.
Can Claude Code work with any LLM?
Claude Code can reach any model behind a gateway that speaks the Anthropic Messages API, but full functionality requires tool/function calling support. Models without tool calling can only return text, not execute file operations or commands.
Will using a different model with Claude Code cost less than Claude?
It depends on the model. GPT-5.5 is roughly comparable to Claude Sonnet on price. Open models served cheaply move the math: morph-dsv4flash (DeepSeek V4 Flash) is $0.139/M input and $0.278/M output, about 21x cheaper than Sonnet's $3/M input. Routing easy turns to a model like that and reserving Opus for hard ones is where the savings come from.
Why do my edits fail more often with alternative models?
Claude Code's edit format was tuned on Claude's output. Other models generate diffs with wrong line numbers, bad whitespace, or mismatched context lines that fail to apply. A dedicated edit layer like Morph FastApply takes the model's raw diff and re-applies it with a purpose-built merge model, which raises apply success above raw application.
What Actually Works (After Testing All This)
We've run Claude Code with probably a dozen different model backends at this point. Here's what we've learned:
Claude is still the best experience. The tool use just works. Edits apply cleanly. If you can afford it and don't have compliance restrictions, stick with Claude.
GPT-5.5 is the closest alternative. Tool calling works. Edits mostly apply. You hit somewhat more failures than Claude, which is livable, and the price is comparable.
Everything else is a tradeoff. Gemini's massive context is great for reading large codebases, but its edit success rate is lower. Qwen writes decent code but struggles with complex tool chains. Local models are free but feel like you're back in 2023.
The real killer is edit failures. When edits fail, you enter a retry loop that burns through tokens and time. We measured this: raw model diffs hit about 70-80% accuracy depending on the model. That 20-30% failure rate compounds fast over a session.
Two things help:
- Run /compact aggressively. Don't wait for Claude Code to auto-compact. Do it yourself every 10-15 turns, especially with alternative models that pollute context faster.
- Offload editing to specialized tools. The Morph MCP server intercepts edit operations and handles merging with a purpose-built model, so the diff applies cleanly regardless of which model generated it. Search goes through WarpGrep, which filters results before they hit your context.
The second option matters more as you scale. Doing 5 edits? Raw model is fine. Doing 50 edits in a session? You want something that doesn't make you manually fix every third one.
Try Morph Sub-Agents with Any Model
FastApply and WarpGrep work with Claude Code regardless of which LLM backend you're using. Improve edit accuracy and reduce context rot.
