VOOZH about

URL: https://www.morphllm.com/use-different-llm-claude-code

⇱ Use a Different LLM (Custom Model) with Claude Code | 2026 Guide | Morph


Use a Different LLM (Custom Model) with Claude Code: 2026 Guide

Set a custom model in Claude Code with four env vars: ANTHROPIC_BASE_URL, ANTHROPIC_AUTH_TOKEN, ANTHROPIC_MODEL, ANTHROPIC_SMALL_FAST_MODEL. Point them at any gateway that speaks the Anthropic Messages API to run GPT, Gemini, DeepSeek, Qwen, or local models.

June 18, 2026 · 2 min read

Quick Answer

Last updated June 2026

TL;DR

Set a custom model in Claude Code with four environment variables. Point ANTHROPIC_BASE_URL at any gateway that speaks the Anthropic Messages API (/v1/messages), set the token, then name the main model and the background model. Claude Code routes every request to that gateway instead of api.anthropic.com.

export ANTHROPIC_BASE_URL="https://openrouter.ai/api" # any Anthropic-Messages gateway
export ANTHROPIC_AUTH_TOKEN="sk-or-v1-your-key" # the gateway key
export ANTHROPIC_MODEL="anthropic/claude-sonnet-4.5" # main model
export ANTHROPIC_SMALL_FAST_MODEL="anthropic/claude-haiku-4.5" # background tasks
export ANTHROPIC_API_KEY="" # keep empty so Claude Code does not fall back to Anthropic auth

OpenRouter (200+ models), Z.AI, and DeepSeek expose the Anthropic Messages format natively, so no proxy is needed. For OpenAI-format, self-hosted, or local models like Ollama, run a LiteLLM gateway: Claude Code talks to LiteLLM, LiteLLM forwards to the upstream provider.

Set a Custom Model in Claude Code

Claude Code hardcodes three model names internally: sonnet, opus, and haiku. To run a custom or open-weight model, you remap those names through ANTHROPIC_MODEL (the main model) and ANTHROPIC_SMALL_FAST_MODEL (background tasks like summarization and title generation). Both point at names your gateway exposes. GitHub issue 12969 tracks making the model ID fully configurable, including AWS China regions and fine-tuned models.

Custom model override

# Remap the hardcoded names to whatever the gateway exposes
export ANTHROPIC_MODEL="deepseek-v4-pro" # used wherever Claude Code asks for sonnet/opus
export ANTHROPIC_SMALL_FAST_MODEL="deepseek-v4-flash" # used wherever it asks for haiku
claude

Setting ANTHROPIC_SMALL_FAST_MODEL matters more than people expect. Claude Code fires the small-fast model constantly for non-coding chores. Pointing it at a cheap tier keeps those off your expensive model. Leave it unset and Claude Code falls back to a Claude haiku alias your gateway may not have.

Pick a Method: Provider Decision Table

The right method depends on whether your provider already speaks the Anthropic Messages API. If it does, point Claude Code straight at it. If it speaks OpenAI format or runs locally, put a LiteLLM gateway in front.

MethodAuthEnv vars/v1/messages native?ModelsCost leverBest for
Direct OpenRouterOpenRouter keyBASE_URL + AUTH_TOKEN + MODELYes (Anthropic skin)200+Volume aggregationFastest switch, widest catalog
LiteLLM gatewayGateway key (any value local)All four + model_listYes (LiteLLM translates)Any providerRouting + fallbacks + capsOpenAI-format, self-hosted, mixed
Claude Code RouterPer-provider keysConfig file, no BASE_URLYes (router serves it)ManyPer-task routing rulesRule-based provider switching
Morph open models via gatewayMorph keyAll four (via LiteLLM)No (OpenAI-only upstream)DeepSeek V4, Qwen, MiniMaxbf16 + cheap open weightsCheapest high-quality open models

Morph serves open models on an OpenAI-compatible endpoint at https://api.morphllm.com/v1 only. There is no /v1/messages and no Anthropic path, so to use a Morph-served model inside Claude Code you forward to it through a LiteLLM gateway: LiteLLM is the Anthropic-compatible shim, Morph is the upstream model. morph-dsv4flash (DeepSeek V4 Flash) is $0.139/M input and $0.278/M output, about 21x cheaper than Claude Sonnet's $3/M input.

Why Developers Switch to Different Models

Claude Code ships locked to Anthropic's API. For most people, that's fine. But you'll want an escape hatch when:

  • Rate limits hit hard. Anthropic's per-minute caps are real. During crunch time, you'll hit them. One team we talked to started routing overflow to GPT-5.5 just to keep shipping.
  • Budget constraints. Claude Sonnet runs about $3/M input tokens. DeepSeek V4 Flash served by Morph is $0.139/M input, about 21x less. For bulk refactoring or test generation, that gap matters.
  • Air-gapped environments. Defense contractors and healthcare orgs often can't send code to external APIs. Period.
  • Curious about alternatives. Qwen3-Coder is surprisingly good at Python. Gemini's 1M context handles massive monorepos. You won't know until you try.

A setup we've seen work: GPT-5.5 for architecture discussions and planning, Claude for implementation (still the strongest at tool use), and a local 14B model for quick questions while offline. The gateway layer makes switching trivial.

Fair Warning

Claude Code was purpose-built for Claude models. According to Anthropic's engineering blog, the agent relies heavily on tool calling patterns that Claude handles natively. Other models can struggle with:

  • Multi-step tool chains (especially the "read file → think → edit file" loop)
  • Generating diffs that actually apply cleanly
  • Extended thinking and artifacts (Opus-only features anyway)

GPT-5.5 and Gemini 3.1 Pro are solid alternatives for most workflows.

Method 1: Direct API Configuration

The simplest approach works with providers that offer Anthropic-compatible endpoints. No proxy needed.

Providers with Native Anthropic API Support

ProviderBase URLNotes
OpenRouterhttps://openrouter.ai/api200+ models, Anthropic skin built-in
Z.AI (GLM)https://api.z.ai/api/anthropicChinese models, global access
Amazon BedrockVia LiteLLM proxyEnterprise, requires IAM setup
Azure OpenAIVia LiteLLM proxyEnterprise, requires configuration

Step-by-Step Setup

1. Get your API key from your chosen provider.

2. Set environment variables in your terminal:

Terminal Configuration

# For OpenRouter
export ANTHROPIC_BASE_URL="https://openrouter.ai/api"
export ANTHROPIC_AUTH_TOKEN="sk-or-v1-your-key"

# Prevent Claude Code from trying Anthropic auth
export ANTHROPIC_API_KEY=""

3. Make it permanent by adding to your shell profile:

Shell Profile (~/.zshrc)

# Add to ~/.bashrc, ~/.zshrc, or ~/.config/fish/config.fish
export ANTHROPIC_BASE_URL="https://openrouter.ai/api"
export ANTHROPIC_AUTH_TOKEN="sk-or-v1-your-key"
export ANTHROPIC_API_KEY=""

4. Verify the connection:

# Restart your terminal, then run Claude Code
claude

# Inside Claude Code, check status
/status

Common Mistake

Don't put these in a project .env file. Claude Code doesn't read standard .env files. Use your shell profile or set them in the terminal directly.

Method 2: LiteLLM Proxy

LiteLLM acts as a universal translator between Claude Code and any LLM provider. It handles the API format conversion automatically.

Use this method when:

  • Your provider only offers OpenAI-format APIs
  • You want to route between multiple providers with fallbacks
  • You need usage tracking and cost controls
  • You're connecting to self-hosted models

Quick Setup

LiteLLM Installation

# Install LiteLLM
pip install litellm

# Create config file
cat > litellm_config.yaml << 'EOF'
model_list:
 - model_name: claude-sonnet-4-5-20250929
 litellm_params:
 model: openrouter/anthropic/claude-sonnet-4.5
 api_key: sk-or-v1-your-key

 - model_name: gpt-5.5
 litellm_params:
 model: openai/gpt-5.5
 api_key: sk-your-openai-key

 - model_name: morph-dsv4flash
 litellm_params:
 model: openai/morph-dsv4flash
 api_base: https://api.morphllm.com/v1
 api_key: sk-your-morph-key
EOF

# Start the proxy
litellm --config litellm_config.yaml --port 4000

Connect Claude Code to LiteLLM

export ANTHROPIC_BASE_URL="http://localhost:4000"
export ANTHROPIC_AUTH_TOKEN="sk-1234" # Any value works locally
export ANTHROPIC_MODEL="claude-sonnet-4-5-20250929" # Or gpt-5.5, morph-dsv4flash
export ANTHROPIC_SMALL_FAST_MODEL="morph-dsv4flash" # cheap model for background tasks

# Now run Claude Code
claude

The proxy translates requests on the fly. Claude Code thinks it's talking to Anthropic, but your requests go to whatever model you configured.

Shell Aliases for Fast Switching

Quick Model Switching

# Add to ~/.zshrc or ~/.bashrc
alias claude-gpt='ANTHROPIC_MODEL=gpt-5.5 claude'
alias claude-ds='ANTHROPIC_MODEL=morph-dsv4flash claude'
alias claude-local='ANTHROPIC_MODEL=ollama/codellama claude'

# Usage
claude-gpt # Starts Claude Code with GPT-5.5
claude-ds # Starts Claude Code with DeepSeek V4 Flash

Method 3: OpenRouter Direct Connection

OpenRouter provides an "Anthropic skin" that speaks Claude Code's native protocol. No proxy needed, direct connection with access to 200+ models.

Setup

# Get your key from openrouter.ai/keys
export ANTHROPIC_BASE_URL="https://openrouter.ai/api"
export ANTHROPIC_AUTH_TOKEN="sk-or-v1-your-key"
export ANTHROPIC_API_KEY=""

# Start Claude Code
claude

Selecting Models

Use the /model command inside Claude Code to switch between OpenRouter models:

# Inside Claude Code
/model anthropic/claude-sonnet-4.5 # Claude Sonnet
/model openai/gpt-5.5 # GPT-5.5
/model google/gemini-3.1-pro # Gemini 3.1 Pro
/model qwen/qwen3-coder # Qwen3-Coder

OpenRouter Pricing

OpenRouter often beats direct API pricing through volume aggregation. Check openrouter.ai/models for current rates. Some models are free for limited use.

Method 4: anyclaude Wrapper

anyclaude is a drop-in wrapper from Coder that makes multi-provider switching dead simple. No proxy server, no config files—just environment variables.

anyclaude Setup

# Install with your package manager
bun add -g anyclaude # or npm, pnpm

# Set your keys
export OPENAI_API_KEY="sk-..."
export GOOGLE_API_KEY="..."
export XAI_API_KEY="..."

# Run with any model using prefixes
anyclaude --model openai/gpt-5.5
anyclaude --model google/gemini-3.1-pro
anyclaude --model xai/grok-4

The prefix tells anyclaude which provider to route to. It handles the API translation under the hood. For OpenRouter models, set OPENAI_API_URL to their endpoint.

This is probably the fastest way to test different models without committing to a full proxy setup. The tradeoff: less flexibility than LiteLLM for complex routing rules.

Method 5: Local LLMs with Ollama

Running models locally eliminates API costs and keeps your code private. The tradeoff: you need decent hardware and most local models don't support tool calling well.

Hardware Requirements

Model SizeVRAM NeededExample Models
7B parameters8GBCodeLlama-7B, Qwen2.5-Coder-7B
14B parameters16GBQwen2.5-Coder-14B
32B parameters24GBQwen2.5-Coder-32B
70B+ parameters48GB+CodeLlama-70B, DeepSeek-Coder-V2

Setup with Ollama + LiteLLM

Local Model Setup

# 1. Install and start Ollama
brew install ollama # macOS
ollama serve

# 2. Pull a coding model
ollama pull qwen2.5-coder:14b

# 3. Configure LiteLLM to use Ollama
cat > litellm_config.yaml << 'EOF'
model_list:
 - model_name: claude-sonnet-4-5-20250929
 litellm_params:
 model: ollama/qwen2.5-coder:14b
 api_base: http://localhost:11434
EOF

# 4. Start proxy
litellm --config litellm_config.yaml --port 4000

# 5. Connect Claude Code
export ANTHROPIC_BASE_URL="http://localhost:4000"
export ANTHROPIC_AUTH_TOKEN="local"
claude

Tool Calling Limitations

Most local models have weak or no tool calling support. This means Claude Code can generate code suggestions but can't automatically read files, run commands, or apply edits. You'll need to copy-paste more manually.

Model Compatibility Matrix

Claude Code's full feature set requires tool calling, large context windows, and reliable instruction following. Here's how popular models stack up:

ModelTool CallingContextEdit ReliabilityBest For
Claude Sonnet 4.5Excellent200KHighestFull Claude Code experience
GPT-5.5Good256KHighPlanning, complex reasoning
GPT-5.5-miniGood256KHighCost-efficient tasks
Gemini 3.1 ProGood1MMedium-highLarge codebase analysis
DeepSeek V4-ProGood1MMedium-highCheap agentic coding
Qwen3-CoderLimited256KMediumCode generation
Local 7B modelsPoor8-32KLowQuick prototyping only

"Edit Reliability" is a relative ranking of how often a model's suggested changes apply cleanly without manual intervention, based on hands-on use, not a single published benchmark. Lower reliability means more failed edits and retry loops. Treat the ordering as directional.

Route by Difficulty Instead of Picking One Model

Once Claude Code can talk to any provider, the better setup is not picking a single replacement model. It is routing each turn to the right one. Most requests to a coding agent are routine: renaming a variable, formatting output, writing a boilerplate test, summarizing a diff. Those run identically on a cheap model as on a frontier one. 60-80% of coding-agent requests fall into that bucket, so sending all of them to Opus or GPT-5.5 burns money for no quality gain. For a team, that is the difference between a bill that grows with headcount and one that grows with how hard everyone leans on the agent.

A router classifies each turn by difficulty, then your gateway calls the cheapest model that handles it. Easy work goes to Haiku 4.5 or an open model, hard work to Opus 4.7 or GPT-5.5. You set the policy once instead of switching models by hand mid-session. The classify step is a separate API call: it returns a model name, it does not proxy the LLM request. Your LiteLLM gateway (the one Claude Code points at via ANTHROPIC_BASE_URL) reads that name and forwards the turn to the chosen upstream.

Routing matrix
Each classified difficulty × ambiguity maps to a Claude model and reasoning effort. Domain overrides match first, then this grid.
Difficulty \ Ambiguity
Clear
Some ambiguity
Vague
Easy
Trivial edits, formatting, simple lookups
Haiku
Low
Haiku
Medium
Haiku
High
Medium
Typical feature work and multi-file edits
Sonnet
Low
Sonnet
Medium
Sonnet
High
Hard
Architecture, tricky debugging, large refactors
Opus
Low
Opus
Medium
Opus
High
Domain overrides
Summary
Haiku
Medium
A routing policy maps each difficulty × ambiguity to a model tier and reasoning effort. The example stays inside the Claude family, but the same grid routes across providers.
60-80%
Coding-agent requests that are routine
40-70%
Token spend cut by routing by difficulty
~430ms
Router classification latency per turn
$0.001
Per classification request

Morph Router exposes POST /v1/router/classify and /v1/router/multimodel. classify returns difficulty, ambiguity, and domain signals; multimodel returns a single recommended model name. Both classify-only: the router does not proxy the LLM request, so you call the returned model yourself (through your gateway, or your own SDK). That keeps the contract and data inside whatever provider you already use. For a team standardizing the setup, see the Claude Code Enterprise writeup on per-seat controls, or the cost optimization guide.

Fixing Common Problems

These come up constantly when troubleshooting alternative model setups. Most are fixable in under a minute.

"Cannot find matching context" / Edits Keep Failing

You ask for a change. The model generates what looks like valid code. Claude Code rejects it. This is the most common issue—GitHub is full of these reports.

The root cause: Claude Code's edit system expects diffs in a specific format. Claude generates them correctly because Anthropic trained it that way. Other models approximate but miss details—wrong line numbers, bad whitespace handling, mismatched context lines.

Quick fixes:

  • Ask for smaller edits. "Change the function signature" works better than "refactor this file"
  • Say "read the file first" before asking for edits. Forces the model to refresh its view
  • Use /compact between major changes to reset context

Proper fix: Use a dedicated code editing layer (like the Morph MCP) that intercepts edit operations and re-applies them with a purpose-built merge model. It takes the raw model diff and applies it cleanly, which removes most of the format-mismatch failures you get from applying alternative-model diffs directly.

Sessions Degrade Over Time

Start fresh, everything works. An hour in, suggestions get worse and latency spikes. Happens faster with non-Claude models.

This is context pollution. Claude Code searches files sequentially—each search loads more content into context, even irrelevant matches. The signal-to-noise ratio drops until the model is reasoning over garbage.

  • Run /compact proactively—every 10-15 turns, not just when prompted
  • Use /clear when switching to unrelated tasks
  • Be specific in searches: "authentication error handler in auth.ts" beats "where do we handle errors"

Model Just Talks Instead of Acting

You: "Read the config file and update the timeout."
Model: "I would read the config file and then update the timeout to..."

Dead giveaway that tool calling isn't working. The model can't execute operations, so it describes what it would do.

  • Verify your model supports function/tool calling (check OpenRouter's model list)
  • GPT-5.5, Gemini 3.1 Pro, Claude: work. Many small local models: don't
  • If using local models, accept that you're getting a generation-only experience

Authentication/Connection Errors

# Most common issue: conflicting API key
export ANTHROPIC_API_KEY="" # Must be empty, not unset

# Check your variables are set correctly
echo "Base URL: $ANTHROPIC_BASE_URL"
echo "Auth Token: $ANTHROPIC_AUTH_TOKEN"

# Common mistake: trailing slash breaks it
# Wrong: https://openrouter.ai/api/
# Right: https://openrouter.ai/api

# Restart your terminal after changes
source ~/.zshrc # or ~/.bashrc

Frequently Asked Questions

How do I set a custom model in Claude Code?

Set ANTHROPIC_BASE_URL to a gateway that speaks the Anthropic Messages API, ANTHROPIC_AUTH_TOKEN to that gateway's key, ANTHROPIC_MODEL to your main model name, and ANTHROPIC_SMALL_FAST_MODEL to a cheaper background model. Claude Code hardcodes the names sonnet, opus, and haiku; these two model variables remap them to whatever the gateway exposes, including custom or open-weight models. GitHub issue 12969 tracks making model IDs fully configurable.

What is ANTHROPIC_AUTH_TOKEN in Claude Code?

ANTHROPIC_AUTH_TOKEN is the bearer token Claude Code sends to whatever endpoint ANTHROPIC_BASE_URL points at. Against api.anthropic.com it is your Anthropic key; against a gateway it is that gateway's key (an OpenRouter key, or any string for a local LiteLLM proxy). Set ANTHROPIC_AUTH_TOKEN, not ANTHROPIC_API_KEY, when using a third-party base URL, and leave ANTHROPIC_API_KEY empty so Claude Code does not fall back to Anthropic auth.

How do I use Claude Code with an LLM gateway?

Run a gateway that exposes the Anthropic Messages API (/v1/messages), then set ANTHROPIC_BASE_URL to it. OpenRouter, Z.AI, and DeepSeek expose this format natively. For OpenAI-format or self-hosted models, run LiteLLM as the gateway: it translates the Anthropic Messages API into the upstream format and forwards the request. Claude Code only ever talks to the gateway.

Does Claude Code work with LiteLLM?

Yes. LiteLLM is the standard Anthropic-Messages-to-anything translation layer for Claude Code. Define a model_list mapping a model name (for example claude-sonnet-4-5) to an upstream provider, start the proxy, then set ANTHROPIC_BASE_URL to http://localhost:4000. This is also how you reach OpenAI-format providers, Bedrock, Azure, Ollama, and Morph-served open models, since none of them expose /v1/messages directly.

What is the Claude Code LLM proxy setup?

Install LiteLLM, write a litellm_config.yaml model_list, start it with litellm --config litellm_config.yaml --port 4000, then export ANTHROPIC_BASE_URL=http://localhost:4000, ANTHROPIC_AUTH_TOKEN to any value, and ANTHROPIC_MODEL to a name in your model_list. The proxy receives Anthropic Messages requests from Claude Code and forwards them to the configured upstream.

Can Claude Code work with any LLM?

Claude Code can reach any model behind a gateway that speaks the Anthropic Messages API, but full functionality requires tool/function calling support. Models without tool calling can only return text, not execute file operations or commands.

Will using a different model with Claude Code cost less than Claude?

It depends on the model. GPT-5.5 is roughly comparable to Claude Sonnet on price. Open models served cheaply move the math: morph-dsv4flash (DeepSeek V4 Flash) is $0.139/M input and $0.278/M output, about 21x cheaper than Sonnet's $3/M input. Routing easy turns to a model like that and reserving Opus for hard ones is where the savings come from.

Why do my edits fail more often with alternative models?

Claude Code's edit format was tuned on Claude's output. Other models generate diffs with wrong line numbers, bad whitespace, or mismatched context lines that fail to apply. A dedicated edit layer like Morph FastApply takes the model's raw diff and re-applies it with a purpose-built merge model, which raises apply success above raw application.

What Actually Works (After Testing All This)

We've run Claude Code with probably a dozen different model backends at this point. Here's what we've learned:

Claude is still the best experience. The tool use just works. Edits apply cleanly. If you can afford it and don't have compliance restrictions, stick with Claude.

GPT-5.5 is the closest alternative. Tool calling works. Edits mostly apply. You hit somewhat more failures than Claude, which is livable, and the price is comparable.

Everything else is a tradeoff. Gemini's massive context is great for reading large codebases, but its edit success rate is lower. Qwen writes decent code but struggles with complex tool chains. Local models are free but feel like you're back in 2023.

The real killer is edit failures. When edits fail, you enter a retry loop that burns through tokens and time. We measured this: raw model diffs hit about 70-80% accuracy depending on the model. That 20-30% failure rate compounds fast over a session.

Two things help:

  1. Run /compact aggressively. Don't wait for Claude Code to auto-compact. Do it yourself every 10-15 turns, especially with alternative models that pollute context faster.
  2. Offload editing to specialized tools. The Morph MCP server intercepts edit operations and handles merging with a purpose-built model, so the diff applies cleanly regardless of which model generated it. Search goes through WarpGrep, which filters results before they hit your context.

The second option matters more as you scale. Doing 5 edits? Raw model is fine. Doing 50 edits in a session? You want something that doesn't make you manually fix every third one.

Try Morph Sub-Agents with Any Model

FastApply and WarpGrep work with Claude Code regardless of which LLM backend you're using. Improve edit accuracy and reduce context rot.