Overview
| Property | Details |
|---|---|
| Description | Azure OpenAI Service provides REST API access to OpenAI's powerful language models including o1, o1-mini, GPT-5, GPT-4o, GPT-4o mini, GPT-4 Turbo with Vision, GPT-4, GPT-3.5-Turbo, and Embeddings model series. Also supports Claude models via Azure Foundry. |
| Provider Route on LiteLLM | azure/, azure/o_series/, azure/gpt5_series/, azure/claude-* (Claude models via Azure Foundry) |
| Supported Operations | /chat/completions, /responses, /completions, /embeddings, /audio/speech, /audio/transcriptions, /fine_tuning, /batches, /files, /images, /anthropic/v1/messages |
| Link to Provider Doc | Azure OpenAI ↗, Azure Foundry Claude ↗ |
API Keys, Params
api_key, api_base, api_version etc can be passed directly to litellm.completion - see here or set as litellm.api_key params see here
import os
os.environ["AZURE_API_KEY"]=""# "my-azure-api-key"
os.environ["AZURE_API_BASE"]=""# "https://example-endpoint.openai.azure.com"
os.environ["AZURE_API_VERSION"]=""# "2023-05-15"
# optional
os.environ["AZURE_AD_TOKEN"]=""
os.environ["AZURE_API_TYPE"]=""
Azure also supports Claude models via Azure Foundry. Use azure/claude-* model names (e.g., azure/claude-sonnet-4-5) with Azure authentication. See the Azure Anthropic documentation for details.
Usage - LiteLLM Python SDK
👁 Open In ColabCompletion - using .env variables
from litellm import completion
## set ENV variables
os.environ["AZURE_API_KEY"]=""
os.environ["AZURE_API_BASE"]=""
os.environ["AZURE_API_VERSION"]=""
# azure call
response = completion(
model ="azure/<your_deployment_name>",
messages =[{"content":"Hello, how are you?","role":"user"}]
)
Completion - using api_key, api_base, api_version
import litellm
# azure call
response = litellm.completion(
model ="azure/<your deployment name>",# model = azure/<your deployment name>
api_base ="",# azure api base
api_version ="",# azure api version
api_key ="",# azure api key
messages =[{"role":"user","content":"good morning"}],
)
Completion - using azure_ad_token, api_base, api_version
import litellm
# azure call
response = litellm.completion(
model ="azure/<your deployment name>",# model = azure/<your deployment name>
api_base ="",# azure api base
api_version ="",# azure api version
azure_ad_token="",# azure_ad_token
messages =[{"role":"user","content":"good morning"}],
)
Usage - LiteLLM Proxy Server
Here's how to call Azure OpenAI models with the LiteLLM Proxy Server
1. Save key in your environment
export AZURE_API_KEY=""
2. Start the proxy
model_list:
-model_name: gpt-3.5-turbo
litellm_params:
model: azure/chatgpt-v-2
api_base: https://openai-gpt-4-test-v-1.openai.azure.com/
api_version:"2023-05-15"
api_key: os.environ/AZURE_API_KEY # The `os.environ/` prefix tells litellm to read this from the env.
3. Test it
- Curl Request
- OpenAI v1.0.0+
- Langchain
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}
'
import openai
client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:4000"
)
response = client.chat.completions.create(model="gpt-3.5-turbo", messages =[
{
"role":"user",
"content":"this is a test request, write a short poem"
}
])
print(response)
from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import(
ChatPromptTemplate,
HumanMessagePromptTemplate,
SystemMessagePromptTemplate,
)
from langchain.schema import HumanMessage, SystemMessage
chat = ChatOpenAI(
openai_api_base="http://0.0.0.0:4000",# set openai_api_base to the LiteLLM Proxy
model ="gpt-3.5-turbo",
temperature=0.1
)
messages =[
SystemMessage(
content="You are a helpful assistant that im using to make a test request to."
),
HumanMessage(
content="test from litellm. tell me why it's amazing in 1 sentence"
),
]
response = chat(messages)
print(response)
Setting API Version
You can set the api_version for Azure OpenAI in your proxy config.yaml in the following ways
Option 1: Per Model Configuration
model_list:
-model_name: gpt-4
litellm_params:
model: azure/my-gpt4-deployment
api_base: https://your-resource.openai.azure.com/
api_version:"2024-08-01-preview"# Set version per model
api_key: os.environ/AZURE_API_KEY
Azure OpenAI Chat Completion Models
We support ALL Azure models, just set model=azure/<your deployment name> as a prefix when sending litellm requests
| Model Name | Function Call |
|---|---|
| o1-mini | response = completion(model="azure/<your deployment name>", messages=messages) |
| o1-preview | response = completion(model="azure/<your deployment name>", messages=messages) |
| gpt-5 | response = completion(model="azure/<your deployment name>", messages=messages) |
| gpt-4o-mini | completion('azure/<your deployment name>', messages) |
| gpt-4o | completion('azure/<your deployment name>', messages) |
| gpt-4 | completion('azure/<your deployment name>', messages) |
| gpt-4-0314 | completion('azure/<your deployment name>', messages) |
| gpt-4-0613 | completion('azure/<your deployment name>', messages) |
| gpt-4-32k | completion('azure/<your deployment name>', messages) |
| gpt-4-32k-0314 | completion('azure/<your deployment name>', messages) |
| gpt-4-32k-0613 | completion('azure/<your deployment name>', messages) |
| gpt-4-1106-preview | completion('azure/<your deployment name>', messages) |
| gpt-4-0125-preview | completion('azure/<your deployment name>', messages) |
| gpt-3.5-turbo | completion('azure/<your deployment name>', messages) |
| gpt-3.5-turbo-0301 | completion('azure/<your deployment name>', messages) |
| gpt-3.5-turbo-0613 | completion('azure/<your deployment name>', messages) |
| gpt-3.5-turbo-16k | completion('azure/<your deployment name>', messages) |
| gpt-3.5-turbo-16k-0613 | completion('azure/<your deployment name>', messages) |
Azure OpenAI Vision Models
| Model Name | Function Call |
|---|---|
| gpt-4-vision | completion(model="azure/<your deployment name>", messages=messages) |
| gpt-4o | completion('azure/<your deployment name>', messages) |
Usage
import os
from litellm import completion
os.environ["AZURE_API_KEY"]="your-api-key"
# azure call
response = completion(
model ="azure/<your deployment name>",
messages=[
{
"role":"user",
"content":[
{
"type":"text",
"text":"What’s in this image?"
},
{
"type":"image_url",
"image_url":{
"url":"https://awsmp-logos.s3.amazonaws.com/seller-xw5kijmvmzasy/c233c9ade2ccb5491072ae232c814942.png"
}
}
]
}
],
)
Usage - with Azure Vision enhancements
Note: Azure requires the base_url to be set with /extensions
Example
base_url=https://gpt-4-vision-resource.openai.azure.com/openai/deployments/gpt-4-vision/extensions
# base_url="{azure_endpoint}/openai/deployments/{azure_deployment}/extensions"
Usage
import os
from litellm import completion
os.environ["AZURE_API_KEY"]="your-api-key"
# azure call
response = completion(
model="azure/gpt-4-vision",
timeout=5,
messages=[
{
"role":"user",
"content":[
{"type":"text","text":"Whats in this image?"},
{
"type":"image_url",
"image_url":{
"url":"https://avatars.githubusercontent.com/u/29436595?v=4"
},
},
],
}
],
base_url="https://gpt-4-vision-resource.openai.azure.com/openai/deployments/gpt-4-vision/extensions",
api_key=os.getenv("AZURE_VISION_API_KEY"),
enhancements={"ocr":{"enabled":True},"grounding":{"enabled":True}},
dataSources=[
{
"type":"AzureComputerVision",
"parameters":{
"endpoint":"https://gpt-4-vision-enhancement.cognitiveservices.azure.com/",
"key": os.environ["AZURE_VISION_ENHANCE_KEY"],
},
}
],
)
O-Series Models
Azure OpenAI O-Series models are supported on LiteLLM.
LiteLLM routes any deployment name with o1 or o3 in the model name, to the O-Series transformation logic.
To set this explicitly, set model to azure/o_series/<your-deployment-name>.
Automatic Routing
- SDK
- PROXY
import litellm
litellm.completion(model="azure/my-o3-deployment", messages=[{"role":"user","content":"Hello, world!"}])# 👈 Note: 'o3' in the deployment name
model_list:
-model_name: o3-mini
litellm_params:
model: azure/o3-model
api_base: os.environ/AZURE_API_BASE
api_key: os.environ/AZURE_API_KEY
Explicit Routing
- SDK
- PROXY
import litellm
litellm.completion(model="azure/o_series/my-random-deployment-name", messages=[{"role":"user","content":"Hello, world!"}])# 👈 Note: 'o_series/' in the deployment name
model_list:
-model_name: o3-mini
litellm_params:
model: azure/o_series/my-random-deployment-name
api_base: os.environ/AZURE_API_BASE
api_key: os.environ/AZURE_API_KEY
GPT-5 Models
| Property | Details |
|---|---|
| Description | Azure OpenAI GPT-5 models |
| Provider Route on LiteLLM | azure/gpt5_series/<custom-name> or azure/gpt-5-deployment-name |
LiteLLM supports using Azure GPT-5 models in one of the two ways:
- Explicit Routing:
model = azure/gpt5_series/<deployment-name>. In this scenario the model onboarded to litellm follows the formatmodel=azure/gpt5_series/<deployment-name>. - Inferred Routing (If the azure deployment name contains
gpt-5in the name):model = azure/gpt-5-mini. In this scenario the model onboarded to litellm follows the formatmodel=azure/gpt-5-mini.
Explicit Routing
Use azure/gpt5_series/<deployment-name> for explicit GPT-5 model routing.
- SDK
- PROXY
import litellm
response = litellm.completion(
model="azure/gpt5_series/my-gpt-5-deployment",
messages=[{"role":"user","content":"Hello, world!"}]
)
model_list:
-model_name: gpt-5
litellm_params:
model: azure/gpt5_series/my-gpt-5-deployment
api_base: os.environ/AZURE_API_BASE
api_key: os.environ/AZURE_API_KEY
Inferred Routing (gpt-5 in the deployment name)
If your Azure deployment name contains gpt-5, LiteLLM automatically recognizes it as a GPT-5 model.
- SDK
- PROXY
import litellm
# Deployment name contains 'gpt-5' - automatically inferred
response = litellm.completion(
model="azure/my-gpt-5-deployment",
messages=[{"role":"user","content":"Hello, world!"}]
)
model_list:
-model_name: gpt-5-mini
litellm_params:
model: azure/my-gpt-5-deployment # deployment name contains 'gpt-5'
api_base: os.environ/AZURE_API_BASE
api_key: os.environ/AZURE_API_KEY
Azure Audio Model
- SDK
- PROXY
from litellm import completion
import os
os.environ["AZURE_API_KEY"]=""
os.environ["AZURE_API_BASE"]=""
os.environ["AZURE_API_VERSION"]=""
response = completion(
model="azure/azure-openai-4o-audio",
messages=[
{
"role":"user",
"content":"I want to try out speech to speech"
}
],
modalities=["text","audio"],
audio={"voice":"alloy","format":"wav"}
)
print(response)
- Setup config.yaml
model_list:
-model_name: azure-openai-4o-audio
litellm_params:
model: azure/azure-openai-4o-audio
api_base: os.environ/AZURE_API_BASE
api_key: os.environ/AZURE_API_KEY
api_version: os.environ/AZURE_API_VERSION
- Start proxy
litellm --config /path/to/config.yaml
- Test it!
curl http://localhost:4000/v1/chat/completions \
-H "Authorization: Bearer $LITELLM_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "azure-openai-4o-audio",
"messages": [{"role": "user", "content": "I want to try out speech to speech"}],
"modalities": ["text","audio"],
"audio": {"voice": "alloy", "format": "wav"}
}'
Azure Instruct Models
Use model="azure_text/<your-deployment>"
| Model Name | Function Call |
|---|---|
| gpt-3.5-turbo-instruct | response = completion(model="azure_text/<your deployment name>", messages=messages) |
| gpt-3.5-turbo-instruct-0914 | response = completion(model="azure_text/<your deployment name>", messages=messages) |
import litellm
## set ENV variables
os.environ["AZURE_API_KEY"]=""
os.environ["AZURE_API_BASE"]=""
os.environ["AZURE_API_VERSION"]=""
response = litellm.completion(
model="azure_text/<your-deployment-name",
messages=[{"role":"user","content":"What is the weather like in Boston?"}]
)
print(response)
Authentication
Entra ID - use azure_ad_token
This is a walkthrough on how to use Azure Active Directory Tokens - Microsoft Entra ID to make litellm.completion() calls.
Note: You can follow the same process below to use Azure Active Directory Tokens for all other Azure endpoints (e.g., chat, embeddings, image, audio, etc.) with LiteLLM.
Step 1 - Download Azure CLI Installation instructions: https://learn.microsoft.com/en-us/cli/azure/install-azure-cli
brew update && brew install azure-cli
Step 2 - Sign in using az
az login --output table
Step 3 - Generate azure ad token
az account get-access-token --resource https://cognitiveservices.azure.com
In this step you should see an accessToken generated
{
"accessToken": "eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIng1dCI6IjlHbW55RlBraGMzaE91UjIybXZTdmduTG83WSIsImtpZCI6IjlHbW55RlBraGMzaE91UjIybXZTdmduTG83WSJ9",
"expiresOn": "2023-11-14 15:50:46.000000",
"expires_on": 1700005846,
"subscription": "db38de1f-4bb3..",
"tenant": "bdfd79b3-8401-47..",
"tokenType": "Bearer"
}
Step 4 - Make litellm.completion call with Azure AD token
Set azure_ad_token = accessToken from step 3 or set os.environ['AZURE_AD_TOKEN']
- SDK
- PROXY config.yaml
response = litellm.completion(
model ="azure/<your deployment name>",# model = azure/<your deployment name>
api_base ="",# azure api base
api_version ="",# azure api version
azure_ad_token="",# your accessToken from step 3
messages =[{"role":"user","content":"good morning"}],
)
model_list:
-model_name: gpt-3.5-turbo
litellm_params:
model: azure/chatgpt-v-2
api_base: https://openai-gpt-4-test-v-1.openai.azure.com/
api_version:"2023-05-15"
azure_ad_token: os.environ/AZURE_AD_TOKEN
Entra ID - use tenant_id, client_id, client_secret
Here is an example of setting up tenant_id, client_id, client_secret in your litellm proxy config.yaml
model_list:
-model_name: gpt-3.5-turbo
litellm_params:
model: azure/chatgpt-v-2
api_base: https://openai-gpt-4-test-v-1.openai.azure.com/
api_version:"2023-05-15"
tenant_id: os.environ/AZURE_TENANT_ID
client_id: os.environ/AZURE_CLIENT_ID
client_secret: os.environ/AZURE_CLIENT_SECRET
azure_scope: os.environ/AZURE_SCOPE # defaults to "https://cognitiveservices.azure.com/.default"
Test it
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}
'
Example video of using tenant_id, client_id, client_secret with LiteLLM Proxy Server
Entra ID - use client_id, username, password
Here is an example of setting up client_id, azure_username, azure_password in your litellm proxy config.yaml
model_list:
-model_name: gpt-3.5-turbo
litellm_params:
model: azure/chatgpt-v-2
api_base: https://openai-gpt-4-test-v-1.openai.azure.com/
api_version:"2023-05-15"
client_id: os.environ/AZURE_CLIENT_ID
azure_username: os.environ/AZURE_USERNAME
azure_password: os.environ/AZURE_PASSWORD
azure_scope: os.environ/AZURE_SCOPE # defaults to "https://cognitiveservices.azure.com/.default"
Test it
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}
'
Azure AD Token Refresh - DefaultAzureCredential
Use this if you want to use Azure DefaultAzureCredential for Authentication on your requests. DefaultAzureCredential automatically discovers and uses available Azure credentials from multiple sources.
- SDK
- PROXY config.yaml
Option 1: Explicit DefaultAzureCredential (Recommended)
from litellm import completion
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
# DefaultAzureCredential automatically discovers credentials from:
# - Environment variables (AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, AZURE_TENANT_ID)
# - Managed Identity (AKS, Azure VMs, etc.)
# - Azure CLI credentials
# - And other Azure identity sources
token_provider = get_bearer_token_provider(DefaultAzureCredential(),"https://cognitiveservices.azure.com/.default")
response = completion(
model ="azure/<your deployment name>",# model = azure/<your deployment name>
api_base ="",# azure api base
api_version ="",# azure api version
azure_ad_token_provider=token_provider,
messages =[{"role":"user","content":"good morning"}],
)
Option 2: LiteLLM Auto-Fallback to DefaultAzureCredential
import litellm
# Enable automatic fallback to DefaultAzureCredential
litellm.enable_azure_ad_token_refresh =True
response = litellm.completion(
model ="azure/<your deployment name>",
api_base ="",
api_version ="",
messages =[{"role":"user","content":"good morning"}],
)
Scenario 1: With Environment Variables (Traditional)
- Add relevant env vars
export AZURE_TENANT_ID=""
export AZURE_CLIENT_ID=""
export AZURE_CLIENT_SECRET=""
- Setup config.yaml
model_list:
-model_name: gpt-3.5-turbo
litellm_params:
model: azure/your-deployment-name
api_base: https://openai-gpt-4-test-v-1.openai.azure.com/
litellm_settings:
enable_azure_ad_token_refresh:true# 👈 KEY CHANGE
Scenario 2: Managed Identity (AKS, Azure VMs) - No Hard-coded Credentials Required
Perfect for AKS clusters, Azure VMs, or other managed environments where Azure automatically injects credentials.
model_list:
-model_name: gpt-3.5-turbo
litellm_params:
model: azure/your-deployment-name
api_base: https://openai-gpt-4-test-v-1.openai.azure.com/
litellm_settings:
enable_azure_ad_token_refresh:true# 👈 KEY CHANGE
Scenario 3: Azure CLI Authentication
If you're authenticated via az login, no additional configuration needed:
model_list:
-model_name: gpt-3.5-turbo
litellm_params:
model: azure/your-deployment-name
api_base: https://openai-gpt-4-test-v-1.openai.azure.com/
litellm_settings:
enable_azure_ad_token_refresh:true# 👈 KEY CHANGE
- Start proxy
litellm --config /path/to/config.yaml
How it works:
- LiteLLM first tries Service Principal authentication (if environment variables are available)
- If that fails, it automatically falls back to
DefaultAzureCredential DefaultAzureCredentialwill use Managed Identity, Azure CLI credentials, or other available Azure identity sources- This eliminates the need for hard-coded credentials in managed environments like AKS
Azure Batches API
| Property | Details |
|---|---|
| Description | Azure OpenAI Batches API |
custom_llm_provider on LiteLLM | azure/ |
| Supported Operations | /v1/batches, /v1/files |
| Azure OpenAI Batches API | Azure OpenAI Batches API ↗ |
| Cost Tracking, Logging Support | ✅ LiteLLM will log, track cost for Batch API Requests |
Quick Start
Just add the azure env vars to your environment.
export AZURE_API_KEY=""
export AZURE_API_BASE=""
- LiteLLM PROXY Server
- LiteLLM SDK
1. Upload a File
- OpenAI Python SDK
- Curl
from openai import OpenAI
# Initialize the client
client = OpenAI(
base_url="http://localhost:4000",
api_key="your-api-key"
)
batch_input_file = client.files.create(
file=open("mydata.jsonl","rb"),
purpose="batch",
extra_headers={"custom-llm-provider":"azure"}
)
file_id = batch_input_file.id
curl http://localhost:4000/v1/files \
-H "Authorization: Bearer sk-1234" \
-F purpose="batch" \
-F file="@mydata.jsonl"
Example File Format
{"custom_id":"task-0","method":"POST","url":"/chat/completions","body":{"model":"REPLACE-WITH-MODEL-DEPLOYMENT-NAME","messages":[{"role":"system","content":"You are an AI assistant that helps people find information."},{"role":"user","content":"When was Microsoft founded?"}]}}
{"custom_id":"task-1","method":"POST","url":"/chat/completions","body":{"model":"REPLACE-WITH-MODEL-DEPLOYMENT-NAME","messages":[{"role":"system","content":"You are an AI assistant that helps people find information."},{"role":"user","content":"When was the first XBOX released?"}]}}
{"custom_id":"task-2","method":"POST","url":"/chat/completions","body":{"model":"REPLACE-WITH-MODEL-DEPLOYMENT-NAME","messages":[{"role":"system","content":"You are an AI assistant that helps people find information."},{"role":"user","content":"What is Altair Basic?"}]}}
2. Create a Batch Request
- OpenAI Python SDK
- Curl
batch = client.batches.create(# re use client from above
input_file_id=file_id,
endpoint="/v1/chat/completions",
completion_window="24h",
metadata={"description":"My batch job"},
extra_headers={"custom-llm-provider":"azure"}
)
curl http://localhost:4000/v1/batches \
-H "Authorization: Bearer $LITELLM_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input_file_id": "file-abc123",
"endpoint": "/v1/chat/completions",
"completion_window": "24h"
}'
3. Retrieve a Batch
- OpenAI Python SDK
- Curl
retrieved_batch = client.batches.retrieve(
batch.id,
extra_headers={"custom-llm-provider":"azure"}
)
curl http://localhost:4000/v1/batches/batch_abc123 \
-H "Authorization: Bearer $LITELLM_API_KEY" \
-H "Content-Type: application/json" \
4. Cancel a Batch
- OpenAI Python SDK
- Curl
cancelled_batch = client.batches.cancel(
batch.id,
extra_headers={"custom-llm-provider":"azure"}
)
curl http://localhost:4000/v1/batches/batch_abc123/cancel \
-H "Authorization: Bearer $LITELLM_API_KEY" \
-H "Content-Type: application/json" \
-X POST
5. List Batches
- OpenAI Python SDK
- Curl
client.batches.list(extra_headers={"custom-llm-provider":"azure"})
curl http://localhost:4000/v1/batches?limit=2 \
-H "Authorization: Bearer $LITELLM_API_KEY" \
-H "Content-Type: application/json"
1. Create File for Batch Completion
from litellm
import os
os.environ["AZURE_API_KEY"]=""
os.environ["AZURE_API_BASE"]=""
file_name ="azure_batch_completions.jsonl"
_current_dir = os.path.dirname(os.path.abspath(__file__))
file_path = os.path.join(_current_dir, file_name)
file_obj =await litellm.acreate_file(
file=open(file_path,"rb"),
purpose="batch",
custom_llm_provider="azure",
)
print("Response from creating file=", file_obj)
2. Create Batch Request
create_batch_response =await litellm.acreate_batch(
completion_window="24h",
endpoint="/v1/chat/completions",
input_file_id=batch_input_file_id,
custom_llm_provider="azure",
metadata={"key1":"value1","key2":"value2"},
)
print("response from litellm.create_batch=", create_batch_response)
3. Retrieve Batch and File Content
retrieved_batch =await litellm.aretrieve_batch(
batch_id=create_batch_response.id,
custom_llm_provider="azure"
)
print("retrieved batch=", retrieved_batch)
# Get file content
file_content =await litellm.afile_content(
file_id=batch_input_file_id,
custom_llm_provider="azure"
)
print("file content = ", file_content)
4. List Batches
list_batches_response = litellm.list_batches(
custom_llm_provider="azure",
limit=2
)
print("list_batches_response=", list_batches_response)
Health Check Azure Batch models
[BETA] Loadbalance Multiple Azure Deployments
In your config.yaml, set enable_loadbalancing_on_batch_endpoints: true
model_list:
-model_name:"batch-gpt-4o-mini"
litellm_params:
model:"azure/gpt-4o-mini"
api_key: os.environ/AZURE_API_KEY
api_base: os.environ/AZURE_API_BASE
model_info:
mode: batch
litellm_settings:
enable_loadbalancing_on_batch_endpoints:true# 👈 KEY CHANGE
Note: This works on {PROXY_BASE_URL}/v1/files and {PROXY_BASE_URL}/v1/batches.
Note: Response is in the OpenAI-format.
- Upload a file
Just set model: batch-gpt-4o-mini in your .jsonl.
curl http://localhost:4000/v1/files \
-H "Authorization: Bearer sk-1234" \
-F purpose="batch" \
-F file="@mydata.jsonl"
Example File
Note: model should be your azure deployment name.
{"custom_id":"task-0","method":"POST","url":"/chat/completions","body":{"model":"batch-gpt-4o-mini","messages":[{"role":"system","content":"You are an AI assistant that helps people find information."},{"role":"user","content":"When was Microsoft founded?"}]}}
{"custom_id":"task-1","method":"POST","url":"/chat/completions","body":{"model":"batch-gpt-4o-mini","messages":[{"role":"system","content":"You are an AI assistant that helps people find information."},{"role":"user","content":"When was the first XBOX released?"}]}}
{"custom_id":"task-2","method":"POST","url":"/chat/completions","body":{"model":"batch-gpt-4o-mini","messages":[{"role":"system","content":"You are an AI assistant that helps people find information."},{"role":"user","content":"What is Altair Basic?"}]}}
Expected Response (OpenAI-compatible)
{"id":"file-f0be81f654454113a922da60acb0eea6",...}
- Create a batch
curl http://0.0.0.0:4000/v1/batches \
-H "Authorization: Bearer $LITELLM_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input_file_id": "file-f0be81f654454113a922da60acb0eea6",
"endpoint": "/v1/chat/completions",
"completion_window": "24h",
"model: "batch-gpt-4o-mini"
}'
Expected Response:
{"id":"batch_94e43f0a-d805-477d-adf9-bbb9c50910ed",...}
- Retrieve a batch
curl http://0.0.0.0:4000/v1/batches/batch_94e43f0a-d805-477d-adf9-bbb9c50910ed \
-H "Authorization: Bearer $LITELLM_API_KEY" \
-H "Content-Type: application/json" \
Expected Response:
{"id":"batch_94e43f0a-d805-477d-adf9-bbb9c50910ed",...}
- List batch
curl http://0.0.0.0:4000/v1/batches?limit=2 \
-H "Authorization: Bearer $LITELLM_API_KEY" \
-H "Content-Type: application/json"
Expected Response:
{"data":[{"id":"batch_R3V...}
Advanced
Azure API Load-Balancing
Use this if you're trying to load-balance across multiple Azure/OpenAI deployments.
Router prevents failed requests, by picking the deployment which is below rate-limit and has the least amount of tokens used.
In production, Router connects to a Redis Cache to track usage across multiple deployments.
Quick Start
uv add litellm
from litellm import Router
model_list =[{# list of model deployments
"model_name":"gpt-3.5-turbo",# openai model name
"litellm_params":{# params for litellm completion/embedding call
"model":"azure/chatgpt-v-2",
"api_key": os.getenv("AZURE_API_KEY"),
"api_version": os.getenv("AZURE_API_VERSION"),
"api_base": os.getenv("AZURE_API_BASE")
},
"tpm":240000,
"rpm":1800
},{
"model_name":"gpt-3.5-turbo",# openai model name
"litellm_params":{# params for litellm completion/embedding call
"model":"azure/chatgpt-functioncalling",
"api_key": os.getenv("AZURE_API_KEY"),
"api_version": os.getenv("AZURE_API_VERSION"),
"api_base": os.getenv("AZURE_API_BASE")
},
"tpm":240000,
"rpm":1800
},{
"model_name":"gpt-3.5-turbo",# openai model name
"litellm_params":{# params for litellm completion/embedding call
"model":"gpt-3.5-turbo",
"api_key": os.getenv("OPENAI_API_KEY"),
},
"tpm":1000000,
"rpm":9000
}]
router = Router(model_list=model_list)
# openai.chat.completions.create replacement
response = router.completion(model="gpt-3.5-turbo",
messages=[{"role":"user","content":"Hey, how's it going?"}]
print(response)
Redis Queue
router = Router(model_list=model_list,
redis_host=os.getenv("REDIS_HOST"),
redis_password=os.getenv("REDIS_PASSWORD"),
redis_port=os.getenv("REDIS_PORT"))
print(response)
Tool Calling / Function Calling
See a detailed walthrough of parallel function calling with litellm here
- SDK
- PROXY
# set Azure env variables
import os
import litellm
import json
os.environ['AZURE_API_KEY']=""# litellm reads AZURE_API_KEY from .env and sends the request
os.environ['AZURE_API_BASE']="https://openai-gpt-4-test-v-1.openai.azure.com/"
os.environ['AZURE_API_VERSION']="2023-07-01-preview"
tools =[
{
"type":"function",
"function":{
"name":"get_current_weather",
"description":"Get the current weather in a given location",
"parameters":{
"type":"object",
"properties":{
"location":{
"type":"string",
"description":"The city and state, e.g. San Francisco, CA",
},
"unit":{"type":"string","enum":["celsius","fahrenheit"]},
},
"required":["location"],
},
},
}
]
response = litellm.completion(
model="azure/chatgpt-functioncalling",# model = azure/<your-azure-deployment-name>
messages=[{"role":"user","content":"What's the weather like in San Francisco, Tokyo, and Paris?"}],
tools=tools,
tool_choice="auto",# auto is default, but we'll be explicit
)
print("\nLLM Response1:\n", response)
response_message = response.choices[0].message
tool_calls = response.choices[0].message.tool_calls
print("\nTool Choice:\n", tool_calls)
- Setup config.yaml
model_list:
-model_name: azure-gpt-3.5
litellm_params:
model: azure/chatgpt-functioncalling
api_base: os.environ/AZURE_API_BASE
api_key: os.environ/AZURE_API_KEY
api_version:"2023-07-01-preview"
- Start proxy
litellm --config config.yaml
- Test it
curl -L -X POST 'http://localhost:4000/v1/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
"model": "azure-gpt-3.5",
"messages": [
{
"role": "user",
"content": "Hey, how'\''s it going? Thinking long and hard before replying - what is the meaning of the world and life itself"
}
]
}'
Spend Tracking for Azure OpenAI Models (PROXY)
Set base model for cost tracking azure image-gen call
Image Generation
model_list:
-model_name: dall-e-3
litellm_params:
model: azure/dall-e-3-test
api_version: 2023-06-01-preview
api_base: https://openai-gpt-4-test-v-1.openai.azure.com/
api_key: os.environ/AZURE_API_KEY
base_model: dall-e-3# 👈 set dall-e-3 as base model
model_info:
mode: image_generation
Chat Completions / Embeddings
Problem: Azure returns gpt-4 in the response when azure/gpt-4-1106-preview is used. This leads to inaccurate cost tracking
Solution ✅ : Set base_model on your config so litellm uses the correct model for calculating azure cost
Get the base model name from here
Example config with base_model
model_list:
-model_name: azure-gpt-3.5
litellm_params:
model: azure/chatgpt-v-2
api_base: os.environ/AZURE_API_BASE
api_key: os.environ/AZURE_API_KEY
api_version:"2023-07-01-preview"
model_info:
base_model: azure/gpt-4-1106-preview
