VOOZH about

URL: https://docs.litellm.ai/docs/providers/perplexity

⇱ Perplexity AI (pplx-api) | liteLLM


Skip to main content

https://www.perplexity.ai

API Key

# env variable
os.environ['PERPLEXITYAI_API_KEY']

Sample Usage

from litellm import completion
import os

os.environ['PERPLEXITYAI_API_KEY']=""
response = completion(
model="perplexity/sonar-pro",
messages=messages
)
print(response)

Sample Usage - Streaming

from litellm import completion
import os

os.environ['PERPLEXITYAI_API_KEY']=""
response = completion(
model="perplexity/sonar-pro",
messages=messages,
stream=True
)

for chunk in response:
print(chunk)

Reasoning Effort

Requires v1.72.6+

info

See full guide on Reasoning with LiteLLM here

You can set the reasoning effort by setting the reasoning_effort parameter.

  • SDK
  • Proxy
from litellm import completion
import os

os.environ['PERPLEXITYAI_API_KEY']=""
response = completion(
model="perplexity/sonar-reasoning",
messages=messages,
reasoning_effort="high"
)
print(response)
  1. Setup config.yaml
model_list:
-model_name: perplexity-sonar-reasoning-model
litellm_params:
model: perplexity/sonar-reasoning
api_key: os.environ/PERPLEXITYAI_API_KEY
  1. Start proxy
litellm --config /path/to/config.yaml
  1. Test it!

Replace anything with your LiteLLM Proxy Virtual Key, if setup.

curl http://0.0.0.0:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer anything" \
-d '{
"model": "perplexity-sonar-reasoning-model",
"messages": [{"role": "user", "content": "Who won the World Cup in 2022?"}],
"reasoning_effort": "high"
}'

Supported Models

All models listed here https://docs.perplexity.ai/docs/model-cards are supported. Just do model=perplexity/<model-name>.

Model NameFunction Call
sonar-deep-researchcompletion(model="perplexity/sonar-deep-research", messages)
sonar-reasoning-procompletion(model="perplexity/sonar-reasoning-pro", messages)
sonar-reasoningcompletion(model="perplexity/sonar-reasoning", messages)
sonar-procompletion(model="perplexity/sonar-pro", messages)
sonarcompletion(model="perplexity/sonar", messages)
r1-1776completion(model="perplexity/r1-1776", messages)

Agent API (Responses API)

Requires v1.72.6+

Using Presets

Presets provide optimized defaults for specific use cases. Start with a preset for quick setup:

  • SDK
  • Proxy
from litellm import responses
import os

os.environ['PERPLEXITY_API_KEY']=""

# Using the pro-search preset
response = responses(
model="perplexity/preset/pro-search",
input="What are the latest developments in AI?",
custom_llm_provider="perplexity",
)

print(response.output)
  1. Setup config.yaml
model_list:
-model_name: perplexity-pro-search
litellm_params:
model: perplexity/preset/pro-search
api_key: os.environ/PERPLEXITY_API_KEY
  1. Start proxy
litellm --config /path/to/config.yaml
  1. Test it!
curl http://0.0.0.0:4000/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer anything" \
-d '{
"model": "perplexity-pro-search",
"input": "What are the latest developments in AI?"
}'

Using Third-Party Models

Access models from OpenAI, Anthropic, Google, xAI, and other providers through Perplexity's unified API:

  • OpenAI
  • Anthropic
  • Google
  • xAI
from litellm import responses
import os

os.environ['PERPLEXITY_API_KEY']=""

response = responses(
model="perplexity/openai/gpt-5.2",
input="Explain quantum computing in simple terms",
custom_llm_provider="perplexity",
max_output_tokens=500,
)

print(response.output)
from litellm import responses
import os

os.environ['PERPLEXITY_API_KEY']=""

response = responses(
model="perplexity/anthropic/claude-sonnet-4-5",
input="Write a short story about a robot learning to paint",
custom_llm_provider="perplexity",
max_output_tokens=500,
)

print(response.output)
from litellm import responses
import os

os.environ['PERPLEXITY_API_KEY']=""

response = responses(
model="perplexity/google/gemini-2.5-flash",
input="Explain the concept of neural networks",
custom_llm_provider="perplexity",
max_output_tokens=500,
)

print(response.output)
from litellm import responses
import os

os.environ['PERPLEXITY_API_KEY']=""

response = responses(
model="perplexity/xai/grok-4-1-fast-non-reasoning",
input="What makes a good AI assistant?",
custom_llm_provider="perplexity",
max_output_tokens=500,
)

print(response.output)

Web Search Tool

Enable web search capabilities to access real-time information:

from litellm import responses
import os

os.environ['PERPLEXITY_API_KEY']=""

response = responses(
model="perplexity/openai/gpt-5.2",
input="What's the weather in San Francisco today?",
custom_llm_provider="perplexity",
tools=[{"type":"web_search"}],
instructions="You have access to a web_search tool. Use it for questions about current events.",
)

print(response.output)

Function Calling

The Agent API supports custom function tools. Pass function tools through unchanged:

from litellm import responses
import os

os.environ['PERPLEXITY_API_KEY']=""

response = responses(
model="perplexity/openai/gpt-5.2",
input="What's the weather in San Francisco?",
custom_llm_provider="perplexity",
tools=[
{"type":"web_search"},
{
"type":"function",
"function":{
"name":"get_weather",
"description":"Get the current weather for a location",
"parameters":{
"type":"object",
"properties":{
"location":{"type":"string"},
"unit":{"type":"string","enum":["celsius","fahrenheit"]},
},
},
},
},
],
instructions="Use tools when appropriate.",
)

print(response.output)

Structured Outputs

Request JSON schema structured outputs via the text parameter:

from litellm import responses
import os

os.environ['PERPLEXITY_API_KEY']=""

response = responses(
model="perplexity/preset/pro-search",
input="Extract key facts about the Eiffel Tower",
custom_llm_provider="perplexity",
text={
"format":{
"type":"json_schema",
"name":"facts",
"schema":{
"type":"object",
"properties":{
"name":{"type":"string"},
"height_meters":{"type":"number"},
"year_built":{"type":"integer"},
},
"required":["name","height_meters","year_built"],
},
"strict":True,
}
},
)

print(response.output)

Reasoning Effort (Responses API)

Control the reasoning effort level for reasoning-capable models:

from litellm import responses
import os

os.environ['PERPLEXITY_API_KEY']=""

response = responses(
model="perplexity/openai/gpt-5.2",
input="Solve this complex problem step by step",
custom_llm_provider="perplexity",
reasoning={"effort":"high"},# Options: low, medium, high
max_output_tokens=1000,
)

print(response.output)

Multi-Turn Conversations

Use message arrays for multi-turn conversations with context:

from litellm import responses
import os

os.environ['PERPLEXITY_API_KEY']=""

response = responses(
model="perplexity/anthropic/claude-sonnet-4-5",
input=[
{"type":"message","role":"system","content":"You are a helpful assistant."},
{"type":"message","role":"user","content":"What are the latest AI developments?"},
],
custom_llm_provider="perplexity",
instructions="Provide detailed, well-researched answers.",
max_output_tokens=800,
)

print(response.output)

Streaming Responses

Stream responses for real-time output:

from litellm import responses
import os

os.environ['PERPLEXITY_API_KEY']=""

response = responses(
model="perplexity/openai/gpt-5.2",
input="Tell me a story about space exploration",
custom_llm_provider="perplexity",
stream=True,
max_output_tokens=500,
)

for chunk in response:
ifhasattr(chunk,'type'):
if chunk.type=="response.output_text.delta":
print(chunk.delta, end="", flush=True)

Supported Third-Party Models

ProviderModel NameFunction Call
OpenAIgpt-5.2responses(model="perplexity/openai/gpt-5.2", ...)
OpenAIgpt-5.1responses(model="perplexity/openai/gpt-5.1", ...)
OpenAIgpt-5-miniresponses(model="perplexity/openai/gpt-5-mini", ...)
Anthropicclaude-opus-4-6responses(model="perplexity/anthropic/claude-opus-4-6", ...)
Anthropicclaude-opus-4-5responses(model="perplexity/anthropic/claude-opus-4-5", ...)
Anthropicclaude-sonnet-4-5responses(model="perplexity/anthropic/claude-sonnet-4-5", ...)
Anthropicclaude-haiku-4-5responses(model="perplexity/anthropic/claude-haiku-4-5", ...)
Googlegemini-3-pro-previewresponses(model="perplexity/google/gemini-3-pro-preview", ...)
Googlegemini-3-flash-previewresponses(model="perplexity/google/gemini-3-flash-preview", ...)
Googlegemini-2.5-proresponses(model="perplexity/google/gemini-2.5-pro", ...)
Googlegemini-2.5-flashresponses(model="perplexity/google/gemini-2.5-flash", ...)
xAIgrok-4-1-fast-non-reasoningresponses(model="perplexity/xai/grok-4-1-fast-non-reasoning", ...)
Perplexitysonarresponses(model="perplexity/perplexity/sonar", ...)

Available Presets

Preset NameFunction Call
fast-searchresponses(model="perplexity/preset/fast-search", ...)
pro-searchresponses(model="perplexity/preset/pro-search", ...)
deep-researchresponses(model="perplexity/preset/deep-research", ...)
advanced-deep-researchresponses(model="perplexity/preset/advanced-deep-research", ...)

Complete Example

from litellm import responses
import os

os.environ['PERPLEXITY_API_KEY']=""

# Comprehensive example with multiple features
response = responses(
model="perplexity/openai/gpt-5.2",
input="Research the latest developments in quantum computing and provide sources",
custom_llm_provider="perplexity",
tools=[
{"type":"web_search"},
{"type":"fetch_url"}
],
instructions="Use web_search to find relevant information and fetch_url to retrieve detailed content from sources. Provide citations for all claims.",
max_output_tokens=1000,
temperature=0.7,
)

print(f"Response ID: {response.id}")
print(f"Model: {response.model}")
print(f"Status: {response.status}")
print(f"Output: {response.output}")
print(f"Usage: {response.usage}")
info

For more information about passing provider-specific parameters, go here