VOOZH about

URL: https://docs.litellm.ai/docs/providers/anthropic

⇱ Anthropic | liteLLM


Skip to main content

LiteLLM supports all anthropic models.

  • claude-opus-4-6 (claude-opus-4-6-20260205)
  • claude-sonnet-4-6
  • claude-sonnet-4-5-20250929
  • claude-opus-4-5-20251101
  • claude-opus-4-1-20250805
  • claude-4 (claude-opus-4-20250514, claude-sonnet-4-20250514)
  • claude-3.7 (claude-3-7-sonnet-20250219)
  • claude-3.5 (claude-3-5-sonnet-20240620)
  • claude-3 (claude-3-haiku-20240307, claude-3-opus-20240229, claude-3-sonnet-20240229)
  • claude-2
  • claude-2.1
  • claude-instant-1.2
PropertyDetails
DescriptionClaude is a highly performant, trustworthy, and intelligent AI platform built by Anthropic. Claude excels at tasks involving language, reasoning, analysis, coding, and more. Also available via Azure Foundry.
Provider Route on LiteLLManthropic/ (add this prefix to the model name, to route any requests to Anthropic - e.g. anthropic/claude-3-5-sonnet-20240620). For Azure Foundry deployments, use azure/claude-* (see Azure Anthropic documentation)
Provider DocAnthropic ↗, Azure Foundry Claude ↗
API Endpoint for Providerhttps://api.anthropic.com (or Azure Foundry endpoint: https://<resource-name>.services.ai.azure.com/anthropic)
Supported Endpoints/chat/completions, /v1/messages (passthrough)

Supported OpenAI Parameters

Check this in code, here

"stream",
"stop",
"temperature",
"top_p",
"max_tokens",
"max_completion_tokens",
"tools",
"tool_choice",
"extra_headers",
"parallel_tool_calls",
"response_format",
"user",
"reasoning_effort",
info

Notes:

  • Anthropic API fails requests when max_tokens are not passed. Due to this litellm passes max_tokens=4096 when no max_tokens are passed.
  • response_format is fully supported for Claude Sonnet 4.5 and Opus 4.1 models (see Structured Outputs section)
  • reasoning_effort is automatically mapped to output_config={"effort": ...} for Claude 4.6 and Opus 4.5 models (see Effort Parameter)

Structured Outputs

LiteLLM supports Anthropic's structured outputs feature for Claude Sonnet 4.5 and Opus 4.1 models. When you use response_format with these models, LiteLLM automatically:

  • Adds the required structured-outputs-2025-11-13 beta header
  • Transforms OpenAI's response_format to Anthropic's output_format format

Supported Models

  • sonnet-4-5 or sonnet-4.5 (all Sonnet 4.5 variants)
  • opus-4-1 or opus-4.1 (all Opus 4.1 variants)
    • opus-4-5 or opus-4.5 (all Opus 4.5 variants)

Example Usage

  • LiteLLM SDK
  • LiteLLM Proxy
from litellm import completion

response = completion(
model="claude-sonnet-4-5-20250929",
messages=[{"role":"user","content":"What is the capital of France?"}],
response_format={
"type":"json_schema",
"json_schema":{
"name":"capital_response",
"strict":True,
"schema":{
"type":"object",
"properties":{
"country":{"type":"string"},
"capital":{"type":"string"}
},
"required":["country","capital"],
"additionalProperties":False
}
}
}
)

print(response.choices[0].message.content)
# Output: {"country": "France", "capital": "Paris"}
  1. Setup config.yaml
model_list:
-model_name: claude-sonnet-4-5
litellm_params:
model: anthropic/claude-sonnet-4-5-20250929
api_key: os.environ/ANTHROPIC_API_KEY
  1. Start proxy
litellm --config /path/to/config.yaml
  1. Test it!
curl http://0.0.0.0:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LITELLM_KEY" \
-d '{
"model": "claude-sonnet-4-5",
"messages": [{"role": "user", "content": "What is the capital of France?"}],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "capital_response",
"strict": true,
"schema": {
"type": "object",
"properties": {
"country": {"type": "string"},
"capital": {"type": "string"}
},
"required": ["country", "capital"],
"additionalProperties": false
}
}
}
}'
info

When using structured outputs with supported models, LiteLLM automatically:

  • Converts OpenAI's response_format to Anthropic's output_schema
  • Adds the anthropic-beta: structured-outputs-2025-11-13 header
  • Creates a tool with the schema and forces the model to use it

API Keys

import os

os.environ["ANTHROPIC_API_KEY"]="your-api-key"
# os.environ["ANTHROPIC_API_BASE"] = "" # [OPTIONAL] or 'ANTHROPIC_BASE_URL'
# os.environ["LITELLM_ANTHROPIC_DISABLE_URL_SUFFIX"] = "true" # [OPTIONAL] Disable automatic URL suffix appending
Azure Foundry Support

Claude models are also available via Microsoft Azure Foundry. Use the azure/ prefix instead of anthropic/ and configure Azure authentication. See the Azure Anthropic documentation for details.

Example:

response = completion(
model="azure/claude-sonnet-4-5",
api_base="https://<resource-name>.services.ai.azure.com/anthropic",
api_key="your-azure-api-key",
messages=[{"role":"user","content":"Hello!"}]
)

Custom API Base

When using a custom API base for Anthropic (e.g., a proxy or custom endpoint), LiteLLM automatically appends the appropriate suffix (/v1/messages or /v1/complete) to your base URL.

If your custom endpoint already includes the full path or doesn't follow Anthropic's standard URL structure, you can disable this automatic suffix appending:

import os

os.environ["ANTHROPIC_API_BASE"]="https://my-custom-endpoint.com/custom/path"
os.environ["LITELLM_ANTHROPIC_DISABLE_URL_SUFFIX"]="true"# Prevents automatic suffix

Without LITELLM_ANTHROPIC_DISABLE_URL_SUFFIX:

  • Base URL https://my-proxy.comhttps://my-proxy.com/v1/messages
  • Base URL https://my-proxy.com/apihttps://my-proxy.com/api/v1/messages

With LITELLM_ANTHROPIC_DISABLE_URL_SUFFIX=true:

  • Base URL https://my-proxy.com/custom/pathhttps://my-proxy.com/custom/path (unchanged)

Azure AI Foundry (Alternative Method)

Recommended Method

For full Azure support including Azure AD authentication, use the dedicated Azure Anthropic provider with azure_ai/ prefix.

As an alternative, you can use the anthropic/ provider directly with your Azure endpoint since Azure exposes Claude using Anthropic's native API.

from litellm import completion

response = completion(
model="anthropic/claude-sonnet-4-5",
api_base="https://<your-resource>.services.ai.azure.com/anthropic",
api_key="<your-azure-api-key>",
messages=[{"role":"user","content":"Hello!"}],
)
print(response)
info

Finding your Azure endpoint: Go to Azure AI Foundry → Your deployment → Overview. Your base URL will be https://<resource-name>.services.ai.azure.com/anthropic

Usage

import os
from litellm import completion

# set env - [OPTIONAL] replace with your anthropic key
os.environ["ANTHROPIC_API_KEY"]="your-api-key"

messages =[{"role":"user","content":"Hey! how's it going?"}]
response = completion(model="claude-opus-4-20250514", messages=messages)
print(response)

Usage - Streaming

Just set stream=True when calling completion.

import os
from litellm import completion

# set env
os.environ["ANTHROPIC_API_KEY"]="your-api-key"

messages =[{"role":"user","content":"Hey! how's it going?"}]
response = completion(model="claude-opus-4-20250514", messages=messages, stream=True)
for chunk in response:
print(chunk["choices"][0]["delta"]["content"])# same as openai format

Usage with LiteLLM Proxy

Here's how to call Anthropic with the LiteLLM Proxy Server

1. Save key in your environment

export ANTHROPIC_API_KEY="your-api-key"

2. Start the proxy

  • config.yaml
  • config - default all Anthropic Model
  • cli
model_list:
-model_name: claude-4### RECEIVED MODEL NAME ###
litellm_params:# all params accepted by litellm.completion() - https://docs.litellm.ai/docs/completion/input
model: claude-opus-4-20250514### MODEL NAME sent to `litellm.completion()` ###
api_key:"os.environ/ANTHROPIC_API_KEY"# does os.getenv("ANTHROPIC_API_KEY")
litellm --config /path/to/config.yaml

Use this if you want to make requests to claude-3-haiku-20240307,claude-3-opus-20240229,claude-2.1 without defining them on the config.yaml

Required env variables

ANTHROPIC_API_KEY=sk-ant****
model_list:
-model_name:"*"
litellm_params:
model:"*"
litellm --config /path/to/config.yaml

Example Request for this config.yaml

Ensure you use anthropic/ prefix to route the request to Anthropic API

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
"model": "anthropic/claude-3-haiku-20240307",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}
'
$ litellm --model claude-opus-4-20250514

# Server running on http://0.0.0.0:4000

3. Test it

  • Curl Request
  • OpenAI v1.0.0+
  • Langchain
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
"model": "claude-3",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}
'
import openai
client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:4000"
)

# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(model="claude-3", messages =[
{
"role":"user",
"content":"this is a test request, write a short poem"
}
])

print(response)

from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import(
ChatPromptTemplate,
HumanMessagePromptTemplate,
SystemMessagePromptTemplate,
)
from langchain.schema import HumanMessage, SystemMessage

chat = ChatOpenAI(
openai_api_base="http://0.0.0.0:4000",# set openai_api_base to the LiteLLM Proxy
model ="claude-3",
temperature=0.1
)

messages =[
SystemMessage(
content="You are a helpful assistant that im using to make a test request to."
),
HumanMessage(
content="test from litellm. tell me why it's amazing in 1 sentence"
),
]
response = chat(messages)

print(response)

Supported Models

Model Name 👉 Human-friendly name.
Function Call 👉 How to call the model in LiteLLM.

Model NameFunction Call
claude-opus-4-6completion('claude-opus-4-6-20260205', messages)
claude-sonnet-4-5completion('claude-sonnet-4-5-20250929', messages)
claude-opus-4-5completion('claude-opus-4-5-20251101', messages)
claude-opus-4-1completion('claude-opus-4-1-20250805', messages)
claude-opus-4completion('claude-opus-4-20250514', messages)
claude-sonnet-4completion('claude-sonnet-4-20250514', messages)
claude-3.7completion('claude-3-7-sonnet-20250219', messages)
claude-3-5-sonnetcompletion('claude-3-5-sonnet-20240620', messages)
claude-3-haikucompletion('claude-3-haiku-20240307', messages)
claude-3-opuscompletion('claude-3-opus-20240229', messages)
claude-3-5-sonnet-20240620completion('claude-3-5-sonnet-20240620', messages)
claude-3-sonnetcompletion('claude-3-sonnet-20240229', messages)
claude-2.1completion('claude-2.1', messages)
claude-2completion('claude-2', messages)
claude-instant-1.2completion('claude-instant-1.2', messages)
claude-instant-1completion('claude-instant-1', messages)

Prompt Caching

Use Anthropic Prompt Caching

Relevant Anthropic API Docs

note

Here's what a sample Raw Request from LiteLLM for Anthropic Context Caching looks like:

POST Request Sent from LiteLLM:
curl -X POST \
https://api.anthropic.com/v1/messages \
-H 'accept: application/json' -H 'anthropic-version: 2023-06-01' -H 'content-type: application/json' -H 'x-api-key: sk-...' \
-d '{'model': 'claude-3-5-sonnet-20240620', [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What are the key terms and conditions in this agreement?",
"cache_control": {
"type": "ephemeral"
}
}
]
},
{
"role": "assistant",
"content": [
{
"type": "text",
"text": "Certainly! The key terms and conditions are the following: the contract is 1 year long for $10/mo"
}
]
}
],
"temperature": 0.2,
"max_tokens": 10
}'

Note: Anthropic no longer requires the anthropic-beta: prompt-caching-2024-07-31 header. Prompt caching now works automatically when you use cache_control in your messages.

Caching - Large Context Caching

This example demonstrates basic Prompt Caching usage, caching the full text of the legal agreement as a prefix while keeping the user instruction uncached.

  • LiteLLM SDK
  • LiteLLM Proxy
response =await litellm.acompletion(
model="anthropic/claude-3-5-sonnet-20240620",
messages=[
{
"role":"system",
"content":[
{
"type":"text",
"text":"You are an AI assistant tasked with analyzing legal documents.",
},
{
"type":"text",
"text":"Here is the full text of a complex legal agreement",
"cache_control":{"type":"ephemeral"},
},
],
},
{
"role":"user",
"content":"what are the key terms and conditions in this agreement?",
},
]
)

info

LiteLLM Proxy is OpenAI compatible

This is an example using the OpenAI Python SDK sending a request to LiteLLM Proxy

Assuming you have a model=anthropic/claude-3-5-sonnet-20240620 on the litellm proxy config.yaml

import openai
client = openai.AsyncOpenAI(
api_key="anything",# litellm proxy api key
base_url="http://0.0.0.0:4000"# litellm proxy base url
)


response =await client.chat.completions.create(
model="anthropic/claude-3-5-sonnet-20240620",
messages=[
{
"role":"system",
"content":[
{
"type":"text",
"text":"You are an AI assistant tasked with analyzing legal documents.",
},
{
"type":"text",
"text":"Here is the full text of a complex legal agreement",
"cache_control":{"type":"ephemeral"},
},
],
},
{
"role":"user",
"content":"what are the key terms and conditions in this agreement?",
},
]
)

Caching - Tools definitions

In this example, we demonstrate caching tool definitions.

The cache_control parameter is placed on the final tool

  • LiteLLM SDK
  • LiteLLM Proxy
import litellm

response =await litellm.acompletion(
model="anthropic/claude-3-5-sonnet-20240620",
messages =[{"role":"user","content":"What's the weather like in Boston today?"}]
tools =[
{
"type":"function",
"function":{
"name":"get_current_weather",
"description":"Get the current weather in a given location",
"parameters":{
"type":"object",
"properties":{
"location":{
"type":"string",
"description":"The city and state, e.g. San Francisco, CA",
},
"unit":{"type":"string","enum":["celsius","fahrenheit"]},
},
"required":["location"],
},
"cache_control":{"type":"ephemeral"}
},
}
]
)
info

LiteLLM Proxy is OpenAI compatible

This is an example using the OpenAI Python SDK sending a request to LiteLLM Proxy

Assuming you have a model=anthropic/claude-3-5-sonnet-20240620 on the litellm proxy config.yaml

import openai
client = openai.AsyncOpenAI(
api_key="anything",# litellm proxy api key
base_url="http://0.0.0.0:4000"# litellm proxy base url
)

response =await client.chat.completions.create(
model="anthropic/claude-3-5-sonnet-20240620",
messages =[{"role":"user","content":"What's the weather like in Boston today?"}]
tools =[
{
"type":"function",
"function":{
"name":"get_current_weather",
"description":"Get the current weather in a given location",
"parameters":{
"type":"object",
"properties":{
"location":{
"type":"string",
"description":"The city and state, e.g. San Francisco, CA",
},
"unit":{"type":"string","enum":["celsius","fahrenheit"]},
},
"required":["location"],
},
"cache_control":{"type":"ephemeral"}
},
}
]
)

Caching - Continuing Multi-Turn Convo

In this example, we demonstrate how to use Prompt Caching in a multi-turn conversation.

The cache_control parameter is placed on the system message to designate it as part of the static prefix.

The conversation history (previous messages) is included in the messages array. The final turn is marked with cache-control, for continuing in followups. The second-to-last user message is marked for caching with the cache_control parameter, so that this checkpoint can read from the previous cache.

  • LiteLLM SDK
  • LiteLLM Proxy
import litellm

response =await litellm.acompletion(
model="anthropic/claude-3-5-sonnet-20240620",
messages=[
# System Message
{
"role":"system",
"content":[
{
"type":"text",
"text":"Here is the full text of a complex legal agreement"
*400,
"cache_control":{"type":"ephemeral"},
}
],
},
# marked for caching with the cache_control parameter, so that this checkpoint can read from the previous cache.
{
"role":"user",
"content":[
{
"type":"text",
"text":"What are the key terms and conditions in this agreement?",
"cache_control":{"type":"ephemeral"},
}
],
},
{
"role":"assistant",
"content":"Certainly! the key terms and conditions are the following: the contract is 1 year long for $10/mo",
},
# The final turn is marked with cache-control, for continuing in followups.
{
"role":"user",
"content":[
{
"type":"text",
"text":"What are the key terms and conditions in this agreement?",
"cache_control":{"type":"ephemeral"},
}
],
},
]
)
info

LiteLLM Proxy is OpenAI compatible

This is an example using the OpenAI Python SDK sending a request to LiteLLM Proxy

Assuming you have a model=anthropic/claude-3-5-sonnet-20240620 on the litellm proxy config.yaml

import openai
client = openai.AsyncOpenAI(
api_key="anything",# litellm proxy api key
base_url="http://0.0.0.0:4000"# litellm proxy base url
)

response =await client.chat.completions.create(
model="anthropic/claude-3-5-sonnet-20240620",
messages=[
# System Message
{
"role":"system",
"content":[
{
"type":"text",
"text":"Here is the full text of a complex legal agreement"
*400,
"cache_control":{"type":"ephemeral"},
}
],
},
# marked for caching with the cache_control parameter, so that this checkpoint can read from the previous cache.
{
"role":"user",
"content":[
{
"type":"text",
"text":"What are the key terms and conditions in this agreement?",
"cache_control":{"type":"ephemeral"},
}
],
},
{
"role":"assistant",
"content":"Certainly! the key terms and conditions are the following: the contract is 1 year long for $10/mo",
},
# The final turn is marked with cache-control, for continuing in followups.
{
"role":"user",
"content":[
{
"type":"text",
"text":"What are the key terms and conditions in this agreement?",
"cache_control":{"type":"ephemeral"},
}
],
},
]
)

Function/Tool Calling

from litellm import completion

# set env
os.environ["ANTHROPIC_API_KEY"]="your-api-key"

tools =[
{
"type":"function",
"function":{
"name":"get_current_weather",
"description":"Get the current weather in a given location",
"parameters":{
"type":"object",
"properties":{
"location":{
"type":"string",
"description":"The city and state, e.g. San Francisco, CA",
},
"unit":{"type":"string","enum":["celsius","fahrenheit"]},
},
"required":["location"],
},
},
}
]
messages =[{"role":"user","content":"What's the weather like in Boston today?"}]

response = completion(
model="anthropic/claude-3-opus-20240229",
messages=messages,
tools=tools,
tool_choice="auto",
)
# Add any assertions, here to check response args
print(response)
assertisinstance(response.choices[0].message.tool_calls[0].function.name,str)
assertisinstance(
response.choices[0].message.tool_calls[0].function.arguments,str
)

Forcing Anthropic Tool Use

If you want Claude to use a specific tool to answer the user’s question

You can do this by specifying the tool in the tool_choice field like so:

response = completion(
model="anthropic/claude-3-opus-20240229",
messages=messages,
tools=tools,
tool_choice={"type":"tool","name":"get_weather"},
)

Disable Tool Calling

You can disable tool calling by setting the tool_choice to "none".

  • SDK
  • Proxy
from litellm import completion

response = completion(
model="anthropic/claude-3-opus-20240229",
messages=messages,
tools=tools,
tool_choice="none",
)

  1. Setup config.yaml
model_list:
-model_name: anthropic-claude-model
litellm_params:
model: anthropic/claude-3-opus-20240229
api_key: os.environ/ANTHROPIC_API_KEY
  1. Start proxy
litellm --config /path/to/config.yaml
  1. Test it!

Replace anything with your LiteLLM Proxy Virtual Key, if setup.

curl http://0.0.0.0:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer anything" \
-d '{
"model": "anthropic-claude-model",
"messages": [{"role": "user", "content": "Who won the World Cup in 2022?"}],
"tools": [{"type": "mcp", "server_label": "deepwiki", "server_url": "https://mcp.deepwiki.com/mcp", "require_approval": "never"}],
"tool_choice": "none"
}'

MCP Tool Calling

Here's how to use MCP tool calling with Anthropic:

  • LiteLLM SDK
  • LiteLLM Proxy

LiteLLM supports MCP tool calling with Anthropic in the OpenAI Responses API format.

  • OpenAI Format
  • Anthropic Format
import os 
from litellm import completion

os.environ["ANTHROPIC_API_KEY"]="sk-ant-..."

tools=[
{
"type":"mcp",
"server_label":"deepwiki",
"server_url":"https://mcp.deepwiki.com/mcp",
"require_approval":"never",
},
]

response = completion(
model="anthropic/claude-sonnet-4-20250514",
messages=[{"role":"user","content":"Who won the World Cup in 2022?"}],
tools=tools
)
import os 
from litellm import completion

os.environ["ANTHROPIC_API_KEY"]="sk-ant-..."

tools =[
{
"type":"url",
"url":"https://mcp.deepwiki.com/mcp",
"name":"deepwiki-mcp",
}
]
response = completion(
model="anthropic/claude-sonnet-4-20250514",
messages=[{"role":"user","content":"Who won the World Cup in 2022?"}],
tools=tools
)

print(response)
  1. Setup config.yaml
model_list:
-model_name: claude-4-sonnet
litellm_params:
model: anthropic/claude-sonnet-4-20250514
api_key: os.environ/ANTHROPIC_API_KEY
  1. Start proxy
litellm --config /path/to/config.yaml
  1. Test it!
  • OpenAI Format
  • Anthropic Format
curl http://0.0.0.0:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LITELLM_KEY" \
-d '{
"model": "claude-4-sonnet",
"messages": [{"role": "user", "content": "Who won the World Cup in 2022?"}],
"tools": [{"type": "mcp", "server_label": "deepwiki", "server_url": "https://mcp.deepwiki.com/mcp", "require_approval": "never"}]
}'
curl http://0.0.0.0:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LITELLM_KEY" \
-d '{
"model": "claude-4-sonnet",
"messages": [{"role": "user", "content": "Who won the World Cup in 2022?"}],
"tools": [
{
"type": "url",
"url": "https://mcp.deepwiki.com/mcp",
"name": "deepwiki-mcp",
}
]
}'

Parallel Function Calling

Here's how to pass the result of a function call back to an anthropic model:

from litellm import completion
import os

os.environ["ANTHROPIC_API_KEY"]="sk-ant.."


litellm.set_verbose =True

### 1ST FUNCTION CALL ###
tools =[
{
"type":"function",
"function":{
"name":"get_current_weather",
"description":"Get the current weather in a given location",
"parameters":{
"type":"object",
"properties":{
"location":{
"type":"string",
"description":"The city and state, e.g. San Francisco, CA",
},
"unit":{"type":"string","enum":["celsius","fahrenheit"]},
},
"required":["location"],
},
},
}
]
messages =[
{
"role":"user",
"content":"What's the weather like in Boston today in Fahrenheit?",
}
]
try:
# test without max tokens
response = completion(
model="anthropic/claude-3-opus-20240229",
messages=messages,
tools=tools,
tool_choice="auto",
)
# Add any assertions, here to check response args
print(response)
assertisinstance(response.choices[0].message.tool_calls[0].function.name,str)
assertisinstance(
response.choices[0].message.tool_calls[0].function.arguments,str
)

messages.append(
response.choices[0].message.model_dump()
)# Add assistant tool invokes
tool_result =(
'{"location": "Boston", "temperature": "72", "unit": "fahrenheit"}'
)
# Add user submitted tool results in the OpenAI format
messages.append(
{
"tool_call_id": response.choices[0].message.tool_calls[0].id,
"role":"tool",
"name": response.choices[0].message.tool_calls[0].function.name,
"content": tool_result,
}
)
### 2ND FUNCTION CALL ###
# In the second response, Claude should deduce answer from tool results
second_response = completion(
model="anthropic/claude-3-opus-20240229",
messages=messages,
tools=tools,
tool_choice="auto",
)
print(second_response)
except Exception as e:
print(f"An error occurred - {str(e)}")

s/o @Shekhar Patnaik for requesting this!

Context Management (Beta)

Anthropic’s context editing API lets you automatically clear older tool results or thinking blocks. LiteLLM now forwards the native context_management payload when you call Anthropic models, and automatically attaches the required context-management-2025-06-27 beta header.

from litellm import completion

response = completion(
model="anthropic/claude-sonnet-4-20250514",
messages=[{"role":"user","content":"Summarize the latest tool results"}],
context_management={
"edits":[
{
"type":"clear_tool_uses_20250919",
"trigger":{"type":"input_tokens","value":30000},
"keep":{"type":"tool_uses","value":3},
"clear_at_least":{"type":"input_tokens","value":5000},
"exclude_tools":["web_search"],
}
]
},
)

Anthropic Hosted Tools (Computer, Text Editor, Web Search, Memory)

  • Computer
  • Text Editor
  • Web Search
  • Memory
from litellm import completion

tools =[
{
"type":"computer_20241022",
"function":{
"name":"computer",
"parameters":{
"display_height_px":100,
"display_width_px":100,
"display_number":1,
},
},
}
]
model ="claude-3-5-sonnet-20241022"
messages =[{"role":"user","content":"Save a picture of a cat to my desktop."}]

resp = completion(
model=model,
messages=messages,
tools=tools,
# headers={"anthropic-beta": "computer-use-2024-10-22"},
)

print(resp)
  • SDK
  • PROXY
from litellm import completion

tools =[{
"type":"text_editor_20250124",
"name":"str_replace_editor"
}]
model ="claude-3-5-sonnet-20241022"
messages =[{"role":"user","content":"There's a syntax error in my primes.py file. Can you help me fix it?"}]

resp = completion(
model=model,
messages=messages,
tools=tools,
)

print(resp)
  1. Setup config.yaml
-model_name: claude-3-5-sonnet-latest
litellm_params:
model: anthropic/claude-3-5-sonnet-latest
api_key: os.environ/ANTHROPIC_API_KEY
  1. Start proxy
litellm --config /path/to/config.yaml
  1. Test it!
curl http://0.0.0.0:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LITELLM_KEY" \
-d '{
"model": "claude-3-5-sonnet-latest",
"messages": [{"role": "user", "content": "There's a syntax error in my primes.py file. Can you help me fix it?"}],
"tools": [{"type": "text_editor_20250124", "name": "str_replace_editor"}]
}'
info

Live from v1.70.1+

LiteLLM maps OpenAI's search_context_size param to Anthropic's max_uses param.

OpenAIAnthropic
Low1
Medium5
High10
  • SDK
  • PROXY
  • OpenAI Format
  • Anthropic Format
from litellm import completion

model ="claude-3-5-sonnet-20241022"
messages =[{"role":"user","content":"What's the weather like today?"}]

resp = completion(
model=model,
messages=messages,
web_search_options={
"search_context_size":"medium",
"user_location":{
"type":"approximate",
"approximate":{
"city":"San Francisco",
},
}
}
)

print(resp)
from litellm import completion

tools =[{
"type":"web_search_20250305",
"name":"web_search",
"max_uses":5
}]
model ="claude-3-5-sonnet-20241022"
messages =[{"role":"user","content":"There's a syntax error in my primes.py file. Can you help me fix it?"}]

resp = completion(
model=model,
messages=messages,
tools=tools,
)

print(resp)
  1. Setup config.yaml
-model_name: claude-3-5-sonnet-latest
litellm_params:
model: anthropic/claude-3-5-sonnet-latest
api_key: os.environ/ANTHROPIC_API_KEY
  1. Start proxy
litellm --config /path/to/config.yaml
  1. Test it!
  • OpenAI Format
  • Anthropic Format
curl http://0.0.0.0:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LITELLM_KEY" \
-d '{
"model": "claude-3-5-sonnet-latest",
"messages": [{"role": "user", "content": "What's the weather like today?"}],
"web_search_options": {
"search_context_size": "medium",
"user_location": {
"type": "approximate",
"approximate": {
"city": "San Francisco",
},
}
}
}'
curl http://0.0.0.0:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LITELLM_KEY" \
-d '{
"model": "claude-3-5-sonnet-latest",
"messages": [{"role": "user", "content": "What's the weather like today?"}],
"tools": [{
"type": "web_search_20250305",
"name": "web_search",
"max_uses": 5
}]
}'
info

The Anthropic Memory tool is currently in beta.

  • SDK
  • Proxy
from litellm import completion

tools =[{
"type":"memory_20250818",
"name":"memory"
}]

model ="claude-sonnet-4-5-20250929"
messages =[{"role":"user","content":"Please remember that my favorite color is blue."}]

response = completion(
model=model,
messages=messages,
tools=tools,
)

print(response)
  1. Setup config.yaml
model_list:
-model_name: claude-memory-model
litellm_params:
model: anthropic/claude-sonnet-4-5-20250929
api_key: os.environ/ANTHROPIC_API_KEY
  1. Start proxy
litellm --config /path/to/config.yaml
  1. Test it!
curl http://0.0.0.0:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LITELLM_KEY" \
-d '{
"model": "claude-memory-model",
"messages": [{"role": "user", "content": "Please remember that my favorite color is blue."}],
"tools": [{"type": "memory_20250818", "name": "memory"}]
}'

Usage - Vision

from litellm import completion

# set env
os.environ["ANTHROPIC_API_KEY"]="your-api-key"

defencode_image(image_path):
import base64

withopen(image_path,"rb")as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")


image_path ="../proxy/cached_logo.jpg"
# Getting the base64 string
base64_image = encode_image(image_path)
resp = litellm.completion(
model="anthropic/claude-3-opus-20240229",
messages=[
{
"role":"user",
"content":[
{"type":"text","text":"Whats in this image?"},
{
"type":"image_url",
"image_url":{
"url":"data:image/jpeg;base64,"+ base64_image
},
},
],
}
],
)
print(f"\nResponse: {resp}")

Usage - Thinking / reasoning_content

LiteLLM translates OpenAI's reasoning_effort to Anthropic's thinking parameter. Code

reasoning_effortthinking
"low""budget_tokens": 1024
"medium""budget_tokens": 2048
"high""budget_tokens": 4096
note

reasoning_effort maps to Anthropic's [adaptive thinking](https: //docs.claude.com/en/docs/build-with-claude/extended-thinking/adaptive-thinking) plus the output_config.effort parameter on Claude 4.6 and 4.7 models (including claude-opus-4-6, claude-opus-4-7, claude-sonnet-4-6, etc. ), not budget_tokens. In particular, LiteLLM will inject the following into the underlying Anthropic request on the OpenAI-compatible /chat/completions route:

{
"thinking":{"type":"adaptive"},
"output_config":{"effort":"<low|medium|high|xhigh|max>"}
}

This means any value other than "none" for reasoning_effort will automatically turn thinking on for these models, even though the OpenAI-compatible request body does not have a separate thinking field. This is intended to match Anthropic's own recommended usage: budget_tokens has been deprecated on 4.6 models and rejected entirely on Opus 4.7, where only adaptive is a supported thinking mode.

You can disable thinking either by omitting reasoning_effort entirely or setting it to "none". LiteLLM will not send a thinking field in that case. You can still pass the native thinking parameter directly if you wish to explicitly control thinking with a fixed budget on prior models:

from litellm import completion

# Disable thinking on Claude 4.6/4.7
resp = completion(
model="anthropic/claude-opus-4-7",
messages=[{"role":"user","content":"What is the capital of France?"}],
reasoning_effort="none",# no thinking field sent
)

# Explicit budget (pre-4.6 models; deprecated on 4.6, rejected on Opus 4.7)
resp = completion(
model="anthropic/claude-sonnet-4-5-20250929",
messages=[{"role":"user","content":"What is the capital of France?"}],
thinking={"type":"enabled","budget_tokens":1024},
)

The Anthropic /v1/messages passthrough route is unaffected by this reasoning effort mapping. thinking is passed through unchanged.

  • SDK
  • PROXY
from litellm import completion

resp = completion(
model="anthropic/claude-3-7-sonnet-20250219",
messages=[{"role":"user","content":"What is the capital of France?"}],
reasoning_effort="low",
)

  1. Setup config.yaml
-model_name: claude-3-7-sonnet-20250219
litellm_params:
model: anthropic/claude-3-7-sonnet-20250219
api_key: os.environ/ANTHROPIC_API_KEY
  1. Start proxy
litellm --config /path/to/config.yaml
  1. Test it!
curl http://0.0.0.0:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
-d '{
"model": "claude-3-7-sonnet-20250219",
"messages": [{"role": "user", "content": "What is the capital of France?"}],
"reasoning_effort": "low"
}'

Expected Response

ModelResponse(
id='chatcmpl-c542d76d-f675-4e87-8e5f-05855f5d0f5e',
created=1740470510,
model='claude-3-7-sonnet-20250219',
object='chat.completion',
system_fingerprint=None,
choices=[
Choices(
finish_reason='stop',
index=0,
message=Message(
content="The capital of France is Paris.",
role='assistant',
tool_calls=None,
function_call=None,
provider_specific_fields={
'citations':None,
'thinking_blocks':[
{
'type':'thinking',
'thinking':'The capital of France is Paris. This is a very straightforward factual question.',
'signature':'EuYBCkQYAiJAy6...'
}
]
}
),
thinking_blocks=[
{
'type':'thinking',
'thinking':'The capital of France is Paris. This is a very straightforward factual question.',
'signature':'EuYBCkQYAiJAy6AGB...'
}
],
reasoning_content='The capital of France is Paris. This is a very straightforward factual question.'
)
],
usage=Usage(
completion_tokens=68,
prompt_tokens=42,
total_tokens=110,
completion_tokens_details=None,
prompt_tokens_details=PromptTokensDetailsWrapper(
audio_tokens=None,
cached_tokens=0,
text_tokens=None,
image_tokens=None
),
cache_creation_input_tokens=0,
cache_read_input_tokens=0
)
)

Pass thinking to Anthropic models

You can also pass the thinking parameter to Anthropic models.

You can also pass the thinking parameter to Anthropic models.

  • SDK
  • PROXY
response = litellm.completion(
model="anthropic/claude-3-7-sonnet-20250219",
messages=[{"role":"user","content":"What is the capital of France?"}],
thinking={"type":"enabled","budget_tokens":1024},
)
curl http://0.0.0.0:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LITELLM_KEY" \
-d '{
"model": "anthropic/claude-3-7-sonnet-20250219",
"messages": [{"role": "user", "content": "What is the capital of France?"}],
"thinking": {"type": "enabled", "budget_tokens": 1024}
}'

Adaptive Thinking (Claude Opus 4.6)

  • SDK
  • PROXY
response = litellm.completion(
model="anthropic/claude-opus-4-6",
messages=[{"role":"user","content":"What is the optimal strategy for solving this problem?"}],
thinking={"type":"adaptive"},
)
curl http://0.0.0.0:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LITELLM_KEY" \
-d '{
"model": "anthropic/claude-opus-4-6",
"messages": [{"role": "user", "content": "What is the optimal strategy for solving this problem?"}],
"thinking": {"type": "adaptive"}
}'

Enabled Thinking with Budget

  • SDK
  • PROXY
response = litellm.completion(
model="anthropic/claude-opus-4-6",
messages=[{"role":"user","content":"What is the capital of France?"}],
thinking={"type":"enabled","budget_tokens":5000},
)
curl http://0.0.0.0:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LITELLM_KEY" \
-d '{
"model": "anthropic/claude-opus-4-6",
"messages": [{"role": "user", "content": "What is the capital of France?"}],
"thinking": {"type": "enabled", "budget_tokens": 5000}
}'

Passing Extra Headers to Anthropic API

Pass extra_headers: dict to litellm.completion

from litellm import completion
messages =[{"role":"user","content":"What is Anthropic?"}]
response = completion(
model="claude-3-5-sonnet-20240620",
messages=messages,
extra_headers={"anthropic-beta":"max-tokens-3-5-sonnet-2024-07-15"}
)

Usage - "Assistant Pre-fill"

You can "put words in Claude's mouth" by including an assistant role message as the last item in the messages array.

[!IMPORTANT] The returned completion will not include your "pre-fill" text, since it is part of the prompt itself. Make sure to prefix Claude's completion with your pre-fill.

import os
from litellm import completion

# set env - [OPTIONAL] replace with your anthropic key
os.environ["ANTHROPIC_API_KEY"]="your-api-key"

messages =[
{"role":"user","content":"How do you say 'Hello' in German? Return your answer as a JSON object, like this:\n\n{ \"Hello\": \"Hallo\" }"},
{"role":"assistant","content":"{"},
]
response = completion(model="claude-2.1", messages=messages)
print(response)

Example prompt sent to Claude


Human: How do you say 'Hello' in German? Return your answer as a JSON object, like this:

{ "Hello": "Hallo" }

Assistant: {

Usage - "System" messages

If you're using Anthropic's Claude 2.1, system role messages are properly formatted for you.

import os
from litellm import completion

# set env - [OPTIONAL] replace with your anthropic key
os.environ["ANTHROPIC_API_KEY"]="your-api-key"

messages =[
{"role":"system","content":"You are a snarky assistant."},
{"role":"user","content":"How do I boil water?"},
]
response = completion(model="claude-2.1", messages=messages)

Example prompt sent to Claude

You are a snarky assistant.

Human: How do I boil water?

Assistant:

Usage - PDF

Pass base64 encoded PDF files to Anthropic models using the file content type with a file_data field.

  • SDK
  • PROXY

using base64

from litellm import completion, supports_pdf_input
import base64
import requests

# URL of the file
url ="https://storage.googleapis.com/cloud-samples-data/generative-ai/pdf/2403.05530.pdf"

# Download the file
response = requests.get(url)
file_data = response.content

encoded_file = base64.b64encode(file_data).decode("utf-8")

## check if model supports pdf input - (2024/11/11) only claude-3-5-haiku-20241022 supports it
supports_pdf_input("anthropic/claude-3-5-haiku-20241022")# True

response = completion(
model="anthropic/claude-3-5-haiku-20241022",
messages=[
{
"role":"user",
"content":[
{"type":"text","text":"You are a very professional document summarization specialist. Please summarize the given document."},
{
"type":"file",
"file":{
"file_data":f"data:application/pdf;base64,{encoded_file}",# 👈 PDF
}
},
],
}
],
max_tokens=300,
)

print(response.choices[0])
  1. Add model to config
-model_name: claude-3-5-haiku-20241022
litellm_params:
model: anthropic/claude-3-5-haiku-20241022
api_key: os.environ/ANTHROPIC_API_KEY
  1. Start Proxy
litellm --config /path/to/config.yaml
  1. Test it!
curl http://0.0.0.0:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
-d '{
"model": "claude-3-5-haiku-20241022",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "You are a very professional document summarization specialist. Please summarize the given document"
},
{
"type": "file",
"file": {
"file_data": f"data:application/pdf;base64,{encoded_file}", # 👈 PDF
}
}
}
]
}
],
"max_tokens": 300
}'

[BETA] Citations API

Pass citations: {"enabled": true} to Anthropic, to get citations on your document responses.

Note: This interface is in BETA. If you have feedback on how citations should be returned, please tell us here

  • SDK
  • PROXY
from litellm import completion

resp = completion(
model="claude-3-5-sonnet-20241022",
messages=[
{
"role":"user",
"content":[
{
"type":"document",
"source":{
"type":"text",
"media_type":"text/plain",
"data":"The grass is green. The sky is blue.",
},
"title":"My Document",
"context":"This is a trustworthy document.",
"citations":{"enabled":True},
},
{
"type":"text",
"text":"What color is the grass and sky?",
},
],
}
],
)

citations = resp.choices[0].message.provider_specific_fields["citations"]

assert citations isnotNone
  1. Setup config.yaml
model_list:
-model_name: anthropic-claude
litellm_params:
model: anthropic/claude-3-5-sonnet-20241022
api_key: os.environ/ANTHROPIC_API_KEY
  1. Start proxy
litellm --config /path/to/config.yaml

# RUNNING on http://0.0.0.0:4000
  1. Test it!
curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
"model": "anthropic-claude",
"messages": [
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "text",
"media_type": "text/plain",
"data": "The grass is green. The sky is blue.",
},
"title": "My Document",
"context": "This is a trustworthy document.",
"citations": {"enabled": True},
},
{
"type": "text",
"text": "What color is the grass and sky?",
},
],
}
]
}'

Files API

Upload files once and reference them by file_id in multiple requests—no need to re-upload content each time.

info

The file_id obtained from Anthropic only works with Anthropic Claude models. You cannot use it with other providers (OpenAI, Bedrock, etc.).

  • Max file size: 500 MB | Total storage: 100 GB per org
  • Pricing: File API operations are free. File content used in Messages requests is priced as input tokens.

Supported models by file type:

  • Images: All Claude 3+ models
  • PDFs: All Claude 3.5+ models
  • Other file types (for code execution): Claude 3.5 Haiku + all Claude 3.7+ models

Quick Start

import litellm
import os

os.environ["ANTHROPIC_API_KEY"]="sk-ant-..."

# 1. Upload a file once
file= litellm.create_file(
file=open("document.pdf","rb"),
purpose="messages",
custom_llm_provider="anthropic",
)

# 2. Use file_id in messages (no re-upload needed)
response = litellm.completion(
model="anthropic/claude-sonnet-4-5-20250929",
messages=[{
"role":"user",
"content":[
{"type":"text","text":"Summarize this document"},
{"type":"file","file":{"file_id":file.id,"format":"application/pdf"}}
]
}]
)

File Operations

OperationFunction
Uploadlitellm.create_file(file, purpose="messages", custom_llm_provider="anthropic")
Listlitellm.file_list(custom_llm_provider="anthropic")
Retrievelitellm.file_retrieve(file_id, custom_llm_provider="anthropic")
Deletelitellm.file_delete(file_id, custom_llm_provider="anthropic")
Downloadlitellm.file_content(file_id, custom_llm_provider="anthropic")
note

Download only works for files created by the code execution tool, not uploaded files.

Supported Formats

File TypeFormat Value
PDFapplication/pdf
Plain texttext/plain
JPEGimage/jpeg
PNGimage/png
GIFimage/gif
WebPimage/webp

Using Images

# Upload image
image = litellm.create_file(
file=open("photo.jpg","rb"),
purpose="messages",
custom_llm_provider="anthropic",
)

# Use in message
response = litellm.completion(
model="anthropic/claude-sonnet-4-5-20250929",
messages=[{
"role":"user",
"content":[
{"type":"text","text":"What's in this image?"},
{"type":"file","file":{"file_id": image.id,"format":"image/jpeg"}}
]
}]
)

Usage - passing 'user_id' to Anthropic

LiteLLM translates the OpenAI user param to Anthropic's metadata[user_id] param.

  • SDK
  • PROXY
response = completion(
model="claude-3-5-sonnet-20240620",
messages=messages,
user="user_123",
)
  1. Setup config.yaml
model_list:
-model_name: claude-3-5-sonnet-20240620
litellm_params:
model: anthropic/claude-3-5-sonnet-20240620
api_key: os.environ/ANTHROPIC_API_KEY
  1. Start Proxy
litellm --config /path/to/config.yaml
  1. Test it!
curl http://0.0.0.0:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
-d '{
"model": "claude-3-5-sonnet-20240620",
"messages": [{"role": "user", "content": "What is Anthropic?"}],
"user": "user_123"
}'

Usage - Agent Skills

LiteLLM supports using Agent Skills with the API

  • SDK
  • PROXY
response = completion(
model="claude-sonnet-4-5-20250929",
messages=messages,
tools=[
{
"type":"code_execution_20250825",
"name":"code_execution"
}
],
container={
"skills":[
{
"type":"anthropic",
"skill_id":"pptx",
"version":"latest"
}
]
}
)
  1. Setup config.yaml
model_list:
-model_name: claude-sonnet-4-5-20250929
litellm_params:
model: anthropic/claude-sonnet-4-5-20250929
api_key: os.environ/ANTHROPIC_API_KEY
  1. Start Proxy
litellm --config /path/to/config.yaml
  1. Test it!
curl --location 'http://localhost:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR-LITELLM-KEY>' \
--data '{
"model": "claude-sonnet-4-5-20250929",
"messages": [
{
"role": "user",
"content": "Hi"
}
],
"tools": [
{
"type": "code_execution_20250825",
"name": "code_execution"
}
],
"container": {
"skills": [
{
"type": "anthropic",
"skill_id": "pptx",
"version": "latest"
}
]
}
}'

The container and its "id" will be present in "provider_specific_fields" in streaming/non-streaming response

🚅
LiteLLM Enterprise
SSO/SAML, audit logs, spend tracking, multi-team management, and guardrails — built for production.
Learn more →