VOOZH about

URL: https://docs.litellm.ai/docs/providers/deepinfra

⇱ DeepInfra | liteLLM


Skip to main content

https://deepinfra.com/

tip

We support ALL DeepInfra models, just set model=deepinfra/<any-model-on-deepinfra> as a prefix when sending litellm requests

Table of Contents

API Key

# env variable
os.environ['DEEPINFRA_API_KEY']

Sample Usage

from litellm import completion
import os

os.environ['DEEPINFRA_API_KEY']=""
response = completion(
model="deepinfra/meta-llama/Llama-2-70b-chat-hf",
messages=[{"role":"user","content":"write code for saying hi from LiteLLM"}]
)

Sample Usage - Streaming

from litellm import completion
import os

os.environ['DEEPINFRA_API_KEY']=""
response = completion(
model="deepinfra/meta-llama/Llama-2-70b-chat-hf",
messages=[{"role":"user","content":"write code for saying hi from LiteLLM"}],
stream=True
)

for chunk in response:
print(chunk)

Chat Models

Model NameFunction Call
meta-llama/Meta-Llama-3-8B-Instructcompletion(model="deepinfra/meta-llama/Meta-Llama-3-8B-Instruct", messages)
meta-llama/Meta-Llama-3-70B-Instructcompletion(model="deepinfra/meta-llama/Meta-Llama-3-70B-Instruct", messages)
meta-llama/Llama-2-70b-chat-hfcompletion(model="deepinfra/meta-llama/Llama-2-70b-chat-hf", messages)
meta-llama/Llama-2-7b-chat-hfcompletion(model="deepinfra/meta-llama/Llama-2-7b-chat-hf", messages)
meta-llama/Llama-2-13b-chat-hfcompletion(model="deepinfra/meta-llama/Llama-2-13b-chat-hf", messages)
codellama/CodeLlama-34b-Instruct-hfcompletion(model="deepinfra/codellama/CodeLlama-34b-Instruct-hf", messages)
mistralai/Mistral-7B-Instruct-v0.1completion(model="deepinfra/mistralai/Mistral-7B-Instruct-v0.1", messages)
jondurbin/airoboros-l2-70b-gpt4-1.4.1completion(model="deepinfra/jondurbin/airoboros-l2-70b-gpt4-1.4.1", messages)

Rerank Endpoint

LiteLLM provides a Cohere API compatible /rerank endpoint for DeepInfra rerank models.

Supported Rerank Models

Model NameDescription
deepinfra/Qwen/Qwen3-Reranker-0.6BLightweight rerank model (0.6B parameters)
deepinfra/Qwen/Qwen3-Reranker-4BMedium rerank model (4B parameters)
deepinfra/Qwen/Qwen3-Reranker-8BLarge rerank model (8B parameters)

Usage - LiteLLM Python SDK

  • SDK
  • PROXY
from litellm import rerank
import os

os.environ["DEEPINFRA_API_KEY"]="your-api-key"

response = rerank(
model="deepinfra/Qwen/Qwen3-Reranker-0.6B",
query="What is the capital of France?",
documents=[
"Paris is the capital of France.",
"London is the capital of the United Kingdom.",
"Berlin is the capital of Germany.",
"Madrid is the capital of Spain.",
"Rome is the capital of Italy."
]
)
print(response)
  1. Add to config.yaml
model_list:
-model_name: Qwen/Qwen3-Reranker-0.6B
litellm_params:
model: deepinfra/Qwen/Qwen3-Reranker-0.6B
api_key: os.environ/DEEPINFRA_API_KEY
  1. Start proxy
litellm --config /path/to/config.yaml

# RUNNING on http://0.0.0.0:4000/
  1. Test it!
curl -L -X POST 'http://0.0.0.0:4000/rerank' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
"model": "Qwen/Qwen3-Reranker-0.6B",
"query": "What is the capital of France?",
"documents": [
"Paris is the capital of France.",
"London is the capital of the United Kingdom.",
"Berlin is the capital of Germany.",
"Madrid is the capital of Spain.",
"Rome is the capital of Italy."
]
}'

Supported Cohere Rerank API Params

ParamTypeDescription
querystrThe query to rerank the documents against
documentslist[str]The documents to rerank

Provider-specific parameters

Pass any deepinfra specific parameters as a keyword argument to the rerank function, e.g.

response = rerank(
model="deepinfra/Qwen/Qwen3-Reranker-0.6B",
query="What is the capital of France?",
documents=[
"Paris is the capital of France.",
"London is the capital of the United Kingdom.",
"Berlin is the capital of Germany.",
"Madrid is the capital of Spain.",
"Rome is the capital of Italy."
],
my_custom_param="my_custom_value", # any other deepinfra specific parameters
)

Response Format

{
"id":"request-id",
"results":[
{
"index":0,
"relevance_score":0.9975274205207825
},
{
"index":1,
"relevance_score":0.011687257327139378
}
],
"meta":{
"billed_units":{
"total_tokens":427
},
"tokens":{
"input_tokens":427,
"output_tokens":0
}
}
}