VOOZH about

URL: https://docs.litellm.ai/docs/providers/bedrock_imported

⇱ Bedrock Imported Models | liteLLM


Skip to main content

Bedrock Imported Models (Deepseek, Deepseek R1, Qwen, OpenAI-compatible models)

Deepseek R1

This is a separate route, as the chat template is different.

PropertyDetails
Provider Routebedrock/deepseek_r1/{model_arn}
Provider DocumentationBedrock Imported Models, Deepseek Bedrock Imported Model
  • SDK
  • Proxy
from litellm import completion
import os

response = completion(
model="bedrock/deepseek_r1/arn:aws:bedrock:us-east-1:086734376398:imported-model/r4c4kewx2s0n",# bedrock/deepseek_r1/{your-model-arn}
messages=[{"role":"user","content":"Tell me a joke"}],
)

1. Add to config

model_list:
-model_name: DeepSeek-R1-Distill-Llama-70B
litellm_params:
model: bedrock/deepseek_r1/arn:aws:bedrock:us-east-1:086734376398:imported-model/r4c4kewx2s0n

2. Start proxy

litellm --config /path/to/config.yaml

# RUNNING at http://0.0.0.0:4000

3. Test it!

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"model": "DeepSeek-R1-Distill-Llama-70B", # 👈 the 'model_name' in config
"messages": [
{
"role": "user",
"content": "what llm are you"
}
],
}'

Deepseek (not R1)

PropertyDetails
Provider Routebedrock/llama/{model_arn}
Provider DocumentationBedrock Imported Models, Deepseek Bedrock Imported Model

Use this route to call Bedrock Imported Models that follow the llama Invoke Request / Response spec

  • SDK
  • Proxy
from litellm import completion
import os

response = completion(
model="bedrock/llama/arn:aws:bedrock:us-east-1:086734376398:imported-model/r4c4kewx2s0n",# bedrock/llama/{your-model-arn}
messages=[{"role":"user","content":"Tell me a joke"}],
)

1. Add to config

model_list:
-model_name: DeepSeek-R1-Distill-Llama-70B
litellm_params:
model: bedrock/llama/arn:aws:bedrock:us-east-1:086734376398:imported-model/r4c4kewx2s0n

2. Start proxy

litellm --config /path/to/config.yaml

# RUNNING at http://0.0.0.0:4000

3. Test it!

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"model": "DeepSeek-R1-Distill-Llama-70B", # 👈 the 'model_name' in config
"messages": [
{
"role": "user",
"content": "what llm are you"
}
],
}'

Qwen3 Imported Models

PropertyDetails
Provider Routebedrock/qwen3/{model_arn}
Provider DocumentationBedrock Imported Models, Qwen3 Models
  • SDK
  • Proxy
from litellm import completion
import os

response = completion(
model="bedrock/qwen3/arn:aws:bedrock:us-east-1:086734376398:imported-model/your-qwen3-model",# bedrock/qwen3/{your-model-arn}
messages=[{"role":"user","content":"Tell me a joke"}],
max_tokens=100,
temperature=0.7
)

1. Add to config

model_list:
-model_name: Qwen3-32B
litellm_params:
model: bedrock/qwen3/arn:aws:bedrock:us-east-1:086734376398:imported-model/your-qwen3-model

2. Start proxy

litellm --config /path/to/config.yaml

# RUNNING at http://0.0.0.0:4000

3. Test it!

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"model": "Qwen3-32B", # 👈 the 'model_name' in config
"messages": [
{
"role": "user",
"content": "what llm are you"
}
],
}'

Qwen2 Imported Models

PropertyDetails
Provider Routebedrock/qwen2/{model_arn}
Provider DocumentationBedrock Imported Models
NoteQwen2 and Qwen3 architectures are mostly similar. The main difference is in the response format: Qwen2 uses "text" field while Qwen3 uses "generation" field.
  • SDK
  • Proxy
from litellm import completion
import os

response = completion(
model="bedrock/qwen2/arn:aws:bedrock:us-east-1:086734376398:imported-model/your-qwen2-model",# bedrock/qwen2/{your-model-arn}
messages=[{"role":"user","content":"Tell me a joke"}],
max_tokens=100,
temperature=0.7
)

1. Add to config

model_list:
-model_name: Qwen2-72B
litellm_params:
model: bedrock/qwen2/arn:aws:bedrock:us-east-1:086734376398:imported-model/your-qwen2-model

2. Start proxy

litellm --config /path/to/config.yaml

# RUNNING at http://0.0.0.0:4000

3. Test it!

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"model": "Qwen2-72B", # 👈 the 'model_name' in config
"messages": [
{
"role": "user",
"content": "what llm are you"
}
],
}'

OpenAI-Compatible Imported Models (Qwen 2.5 VL, etc.)

Use this route for Bedrock imported models that follow the OpenAI Chat Completions API spec. This includes models like Qwen 2.5 VL that accept OpenAI-formatted messages with support for vision (images), tool calling, and other OpenAI features.

PropertyDetails
Provider Routebedrock/openai/{model_arn}
Provider DocumentationBedrock Imported Models
Supported FeaturesVision (images), tool calling, streaming, system messages

LiteLLMSDK Usage

Basic Usage

from litellm import completion

response = completion(
model="bedrock/openai/arn:aws:bedrock:us-east-1:046319184608:imported-model/0m2lasirsp6z",# bedrock/openai/{your-model-arn}
messages=[{"role":"user","content":"Tell me a joke"}],
max_tokens=300,
temperature=0.5
)

With Vision (Images)

import base64
from litellm import completion

# Load and encode image
withopen("image.jpg","rb")as f:
image_base64 = base64.b64encode(f.read()).decode("utf-8")

response = completion(
model="bedrock/openai/arn:aws:bedrock:us-east-1:046319184608:imported-model/0m2lasirsp6z",
messages=[
{
"role":"system",
"content":"You are a helpful assistant that can analyze images."
},
{
"role":"user",
"content":[
{"type":"text","text":"What's in this image?"},
{
"type":"image_url",
"image_url":{"url":f"data:image/jpeg;base64,{image_base64}"}
}
]
}
],
max_tokens=300,
temperature=0.5
)

Comparing Multiple Images

import base64
from litellm import completion

# Load images
withopen("image1.jpg","rb")as f:
image1_base64 = base64.b64encode(f.read()).decode("utf-8")
withopen("image2.jpg","rb")as f:
image2_base64 = base64.b64encode(f.read()).decode("utf-8")

response = completion(
model="bedrock/openai/arn:aws:bedrock:us-east-1:046319184608:imported-model/0m2lasirsp6z",
messages=[
{
"role":"system",
"content":"You are a helpful assistant that can analyze images."
},
{
"role":"user",
"content":[
{"type":"text","text":"Spot the difference between these two images?"},
{
"type":"image_url",
"image_url":{"url":f"data:image/jpeg;base64,{image1_base64}"}
},
{
"type":"image_url",
"image_url":{"url":f"data:image/jpeg;base64,{image2_base64}"}
}
]
}
],
max_tokens=300,
temperature=0.5
)

LiteLLM Proxy Usage (AI Gateway)

1. Add to config

model_list:
-model_name: qwen-25vl-72b
litellm_params:
model: bedrock/openai/arn:aws:bedrock:us-east-1:046319184608:imported-model/0m2lasirsp6z

2. Start proxy

litellm --config /path/to/config.yaml

# RUNNING at http://0.0.0.0:4000

3. Test it!

Basic text request:

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen-25vl-72b",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
],
"max_tokens": 300
}'

With vision (image):

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen-25vl-72b",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant that can analyze images."
},
{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{
"type": "image_url",
"image_url": {"url": "data:image/jpeg;base64,/9j/4AAQSkZ..."}
}
]
}
],
"max_tokens": 300,
"temperature": 0.5
}'

Moonshot Kimi K2 Thinking

Moonshot AI's Kimi K2 Thinking model is now available on Amazon Bedrock. This model features advanced reasoning capabilities with automatic reasoning content extraction.

PropertyDetails
Provider Routebedrock/moonshot.kimi-k2-thinking, bedrock/invoke/moonshot.kimi-k2-thinking
Provider DocumentationAWS Bedrock Moonshot Announcement ↗
Supported Parameterstemperature, max_tokens, top_p, stream, tools, tool_choice
Special FeaturesReasoning content extraction, Tool calling

Supported Features

  • Reasoning Content Extraction: Automatically extracts <reasoning> tags and returns them as reasoning_content (similar to OpenAI's o1 models)
  • Tool Calling: Full support for function/tool calling with tool responses
  • Streaming: Both streaming and non-streaming responses
  • System Messages: System message support

Basic Usage

  • SDK
  • Proxy
Moonshot Kimi K2 SDK Usage
from litellm import completion
import os

os.environ["AWS_ACCESS_KEY_ID"]="your-aws-access-key"
os.environ["AWS_SECRET_ACCESS_KEY"]="your-aws-secret-key"
os.environ["AWS_REGION_NAME"]="us-west-2"# or your preferred region

# Basic completion
response = completion(
model="bedrock/moonshot.kimi-k2-thinking",# or bedrock/invoke/moonshot.kimi-k2-thinking
messages=[
{"role":"user","content":"What is 2+2? Think step by step."}
],
temperature=0.7,
max_tokens=200
)

print(response.choices[0].message.content)

# Access reasoning content if present
if response.choices[0].message.reasoning_content:
print("Reasoning:", response.choices[0].message.reasoning_content)

1. Add to config

config.yaml
model_list:
-model_name: kimi-k2
litellm_params:
model: bedrock/moonshot.kimi-k2-thinking
aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
aws_region_name: us-west-2

2. Start proxy

Start LiteLLM Proxy
litellm --config /path/to/config.yaml

# RUNNING at http://0.0.0.0:4000

3. Test it!

Test Kimi K2 via Proxy
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"model": "kimi-k2",
"messages": [
{
"role": "user",
"content": "What is 2+2? Think step by step."
}
],
"temperature": 0.7,
"max_tokens": 200
}'

Tool Calling Example

Kimi K2 with Tool Calling
from litellm import completion
import os

os.environ["AWS_ACCESS_KEY_ID"]="your-aws-access-key"
os.environ["AWS_SECRET_ACCESS_KEY"]="your-aws-secret-key"
os.environ["AWS_REGION_NAME"]="us-west-2"

# Tool calling example
response = completion(
model="bedrock/moonshot.kimi-k2-thinking",
messages=[
{"role":"user","content":"What's the weather in Tokyo?"}
],
tools=[
{
"type":"function",
"function":{
"name":"get_weather",
"description":"Get the current weather in a location",
"parameters":{
"type":"object",
"properties":{
"location":{
"type":"string",
"description":"The city name"
}
},
"required":["location"]
}
}
}
]
)

if response.choices[0].message.tool_calls:
tool_call = response.choices[0].message.tool_calls[0]
print(f"Tool called: {tool_call.function.name}")
print(f"Arguments: {tool_call.function.arguments}")

Streaming Example

Kimi K2 Streaming
from litellm import completion
import os

os.environ["AWS_ACCESS_KEY_ID"]="your-aws-access-key"
os.environ["AWS_SECRET_ACCESS_KEY"]="your-aws-secret-key"
os.environ["AWS_REGION_NAME"]="us-west-2"

response = completion(
model="bedrock/moonshot.kimi-k2-thinking",
messages=[
{"role":"user","content":"Explain quantum computing in simple terms."}
],
stream=True,
temperature=0.7
)

for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")

# Check for reasoning content in streaming
ifhasattr(chunk.choices[0].delta,'reasoning_content')and chunk.choices[0].delta.reasoning_content:
print(f"\n[Reasoning: {chunk.choices[0].delta.reasoning_content}]")

Supported Parameters

ParameterTypeDescriptionSupported
temperaturefloat (0-1)Controls randomness in output
max_tokensintegerMaximum tokens to generate
top_pfloatNucleus sampling parameter
streambooleanEnable streaming responses
toolsarrayTool/function definitions
tool_choicestring/objectTool choice specification
stoparrayStop sequences❌ (Not supported on Bedrock)
🚅
LiteLLM Enterprise
SSO/SAML, audit logs, spend tracking, multi-team management, and guardrails — built for production.
Learn more →