VOOZH about

URL: https://docs.litellm.ai/docs/providers/snowflake

⇱ Snowflake Cortex | liteLLM


Skip to main content

LiteLLM supports all models on the Snowflake Cortex REST API, including models from Anthropic (Claude), OpenAI (GPT), Meta (Llama), Mistral, DeepSeek, and Snowflake.

DescriptionSnowflake Cortex REST API provides access to leading frontier LLMs through OpenAI-compatible and Anthropic-compatible endpoints. All inference runs within Snowflake's security perimeter.
Provider Route on LiteLLMsnowflake/
Provider DocsCortex REST API ↗
API EndpointsChat Completions: https://{account}.snowflakecomputing.com/api/v2/cortex/v1/chat/completions
Messages: https://{account}.snowflakecomputing.com/api/v2/cortex/v1/messages
Legacy: https://{account}.snowflakecomputing.com/api/v2/cortex/inference:complete
Supported OpenAI Endpoints/chat/completions, /completions, /embeddings

Tip : We support ALL Snowflake Cortex models. Use model=snowflake/<model-name> as a prefix when sending LiteLLM requests.

Authentication

Snowflake Cortex REST API supports three authentication methods.

The simplest approach. Generate a PAT in Snowsight under User Menu → My Profile → Programmatic Access Tokens.

import os
from litellm import completion

os.environ["SNOWFLAKE_API_KEY"]="pat/<your-programmatic-access-token>"
os.environ["SNOWFLAKE_API_BASE"]="https://<account>.snowflakecomputing.com/api/v2/cortex/v1"

response = completion(
model="snowflake/claude-sonnet-4-5",
messages=[{"role":"user","content":"Hello!"}],
)

JWT (Key-Pair Authentication)

Generate a JWT from a Snowflake key pair. See Key-pair authentication.

import os
from litellm import completion

os.environ["SNOWFLAKE_JWT"]="<your-jwt-token>"
os.environ["SNOWFLAKE_ACCOUNT_ID"]="<orgname>-<account_name>"

response = completion(
model="snowflake/claude-sonnet-4-5",
messages=[{"role":"user","content":"Hello!"}],
)

Pass credentials as parameters

from litellm import completion

# Using PAT
response = completion(
model="snowflake/claude-sonnet-4-5",
messages=[{"role":"user","content":"Hello!"}],
api_key="pat/<your-pat-token>",
api_base="https://<account>.snowflakecomputing.com/api/v2/cortex/v1",
)

# Using JWT
response = completion(
model="snowflake/claude-sonnet-4-5",
messages=[{"role":"user","content":"Hello!"}],
api_key="<your-jwt-token>",
account_id="<orgname>-<account_name>",
)

For all authentication options, see Authenticating to Cortex REST API.

Usage

  • SDK
  • PROXY
from litellm import completion
import os

os.environ["SNOWFLAKE_API_KEY"]="pat/<your-pat>"
os.environ["SNOWFLAKE_API_BASE"]="https://<account>.snowflakecomputing.com/api/v2/cortex/v1"

response = completion(
model="snowflake/claude-sonnet-4-5",
messages=[{"role":"user","content":"What is Snowflake Cortex?"}],
)
print(response.choices[0].message.content)

1. Config

model_list:
-model_name: claude-sonnet
litellm_params:
model: snowflake/claude-sonnet-4-5
api_key: pat/<your-pat>
api_base: https://<account>.snowflakecomputing.com/api/v2/cortex/v1
-model_name: llama4-maverick
litellm_params:
model: snowflake/llama4-maverick
api_key: pat/<your-pat>
api_base: https://<account>.snowflakecomputing.com/api/v2/cortex/v1

2. Start proxy

litellm --config /path/to/config.yaml

3. Test

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "claude-sonnet",
"messages": [
{"role": "user", "content": "What is Snowflake Cortex?"}
]
}'

Supported OpenAI Parameters

temperature, max_tokens, top_p, stream, response_format,
tools, tool_choice

Streaming

  • SDK
  • PROXY
from litellm import completion
import os

os.environ["SNOWFLAKE_API_KEY"]="pat/<your-pat>"
os.environ["SNOWFLAKE_API_BASE"]="https://<account>.snowflakecomputing.com/api/v2/cortex/v1"

response = completion(
model="snowflake/claude-sonnet-4-5",
messages=[{"role":"user","content":"Write a haiku about data."}],
stream=True,
)

for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "claude-sonnet",
"messages": [{"role": "user", "content": "Write a haiku about data."}],
"stream": true
}'

Tool / Function Calling

Supported on Claude and select models. LiteLLM automatically transforms OpenAI tool format to Snowflake's tool_spec format.

  • SDK
  • PROXY
from litellm import completion
import os, json

os.environ["SNOWFLAKE_API_KEY"]="pat/<your-pat>"
os.environ["SNOWFLAKE_API_BASE"]="https://<account>.snowflakecomputing.com/api/v2/cortex/v1"

tools =[
{
"type":"function",
"function":{
"name":"get_weather",
"description":"Get the current weather for a location",
"parameters":{
"type":"object",
"properties":{
"location":{"type":"string","description":"City name"}
},
"required":["location"],
},
},
}
]

response = completion(
model="snowflake/claude-sonnet-4-5",
messages=[{"role":"user","content":"What's the weather in San Francisco?"}],
tools=tools,
tool_choice="auto",
)

print(response.choices[0].message.tool_calls)
model_list:
-model_name: claude-sonnet
litellm_params:
model: snowflake/claude-sonnet-4-5
api_key: pat/<your-pat>
api_base: https://<account>.snowflakecomputing.com/api/v2/cortex/v1
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "claude-sonnet",
"messages": [{"role": "user", "content": "What is the weather in SF?"}],
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {"location": {"type": "string"}},
"required": ["location"]
}
}
}],
"tool_choice": "auto"
}'

Thinking / Reasoning

Claude 3.7 Sonnet, Claude 4 Opus, and DeepSeek R1 on Cortex support extended thinking. LiteLLM translates reasoning_effort to the provider's thinking parameter.

reasoning_effortbudget_tokens
"low"1024
"medium"2048
"high"4096
from litellm import completion

response = completion(
model="snowflake/claude-3-7-sonnet",
messages=[{"role":"user","content":"Solve: what is 127 * 389?"}],
reasoning_effort="low",
)
print(response.choices[0].message.content)

Prompt Caching

Snowflake Cortex supports prompt caching to reduce costs:

  • OpenAI models: Implicit caching for prompts ≥ 1,024 tokens (no code changes needed)
  • Claude models: Explicit caching via cache_control breakpoints

Cached input tokens are billed at 10% of the regular input rate (90% discount) when ≥ 1,024 tokens are cached.

See Cortex REST API Billing & Cost Analysis for details.

Embeddings

from litellm import embedding
import os

os.environ["SNOWFLAKE_API_KEY"]="pat/<your-pat>"
os.environ["SNOWFLAKE_API_BASE"]="https://<account>.snowflakecomputing.com/api/v2/cortex/v1"

response = embedding(
model="snowflake/snowflake-arctic-embed-l-v2.0",
input=["Snowflake Cortex provides LLM inference"],
)
print(response.data[0]["embedding"][:5])

Supported Models

All models are available through the snowflake/ prefix.

tip

For current model availability, rate limits, and pricing, see the official Cortex REST API docs and Service Consumption Table.

Chat Completion Models

Modellitellm model nameFunction CallingVisionPrompt Caching
Claude Sonnet 4.5snowflake/claude-sonnet-4-5
Claude Sonnet 4.6snowflake/claude-sonnet-4-6
Claude 4 Sonnetsnowflake/claude-4-sonnet
Claude 4 Opussnowflake/claude-4-opus
Claude Haiku 4.5snowflake/claude-haiku-4-5
Claude 3.7 Sonnetsnowflake/claude-3-7-sonnet
Claude 3.5 Sonnetsnowflake/claude-3-5-sonnet
OpenAI GPT-4.1snowflake/openai-gpt-4.1
OpenAI GPT-5snowflake/openai-gpt-5
OpenAI GPT-5 Minisnowflake/openai-gpt-5-mini
OpenAI GPT-5 Nanosnowflake/openai-gpt-5-nano
DeepSeek R1snowflake/deepseek-r1
Mistral Large 2snowflake/mistral-large2
Llama 3.1 8Bsnowflake/llama3.1-8b
Llama 3.1 70Bsnowflake/llama3.1-70b
Llama 3.1 405Bsnowflake/llama3.1-405b
Llama 3.3 70Bsnowflake/llama3.3-70b
Llama 4 Mavericksnowflake/llama4-maverick
Snowflake Llama 3.3 70Bsnowflake/snowflake-llama-3.3-70b

Embedding Models

Modellitellm model name
Snowflake Arctic Embed L v2.0snowflake/snowflake-arctic-embed-l-v2.0
Snowflake Arctic Embed M v2.0snowflake/snowflake-arctic-embed-m-v2.0