DeepInfra Pricing

Per-token pricing for 72 open-source LLMs on DeepInfra — one of the cheapest serverless inference providers. Pay only for the tokens you use, with no minimums and no charge for idle time.

Last updated: Jun 28, 2026

👁 The Grid | Spot Priced LLM API

The Grid | Spot Priced LLM APISponsored

Change 3 lines of code and let providers compete for your requests in real time.

Get started for free

Get our weekly newsletter on pricing changes, new releases, and tools.

DeepInfra free tier & credits

No ongoing free tier — see credits & cheapest models — see the full DeepInfra free tier guide

Models

$0.06

Cheapest Input/1M

$0.20

Cheapest Output/1M

Model Authors

Sponsored 👁 The Grid | Spot Priced LLM API

The Grid | Spot Priced LLM API — Why pay list price when providers will bid for your API usage? Up to 80% off.

View spot prices

Provider	Model	Input/1M	Output/1M
Z Z-ai	GLM-4.7-Flash	$0.060	$0.400
G Google	Gemma 4 26B A4B Instruct	$0.070	$0.340
NV Nvidia	Nemotron-3 Super 120B A12B	$0.085	$0.400
S StepFun	Step 3.5 Flash	$0.090	$0.300
DS Deepseek	DeepSeek V4 Flash (Non-Reasoning)	$0.100	$0.200
G Google	Gemma 4 31B Instruct	$0.130	$0.380
MM Minimax	MiniMax M2.5	$0.150	$1.150
QW Qwen	Qwen3.6 35B A3B	$0.150	$0.950
NV Nvidia	Nemotron 3 Super 120B A12B	$0.200	$0.800
DS Deepseek	DeepSeek V3.2	$0.260	$0.380
X Xiaomi	MiMo v2.5	$0.400	$2.000
MS Moonshotai	Kimi K2.5	$0.450	$2.250
QW Qwen	Qwen3.5 397B A17B	$0.450	$3.000
DI DeepInfra	nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B	$0.500	$2.200
Z Z-ai	GLM 5	$0.600	$2.080
MS Moonshotai	Kimi K2.7 Code	$0.740	$3.500
MS Moonshotai	Kimi K2.6	$0.750	$3.500
Z Z-ai	GLM-5.2	$0.950	$3.000
X Xiaomi	MiMo v2.5 Pro	$1.000	$3.000
Z Z-ai	GLM 5.1	$1.050	$3.500
QW Qwen	Qwen3 Max	$1.200	$6.000
QW Qwen	Qwen3 Max Thinking	$1.200	$6.000
DS Deepseek	DeepSeek V4 Pro	$1.300	$2.600
B Black Forest Labs	FLUX 1.1 Pro	$N/A	$N/A
B Black Forest Labs	FLUX.1 Dev	$N/A	$N/A
B Black Forest Labs	FLUX.1 Kontext Dev	$N/A	$N/A
B Black Forest Labs	FLUX.1 Redux Dev	$N/A	$N/A
B Black Forest Labs	FLUX.1 Schnell	$N/A	$N/A
B Black Forest Labs	FLUX.2 Dev	$N/A	$N/A
B Black Forest Labs	FLUX.2 Klein 4B	$N/A	$N/A
B Black Forest Labs	FLUX.2 Klein 9B	$N/A	$N/A
B Black Forest Labs	FLUX.2 Max	$N/A	$N/A
B Black Forest Labs	FLUX.2 Pro	$N/A	$N/A
B Black Forest Labs	FLUX.1 Pro	$N/A	$N/A
B Bria	Bria Blur Background	$N/A	$N/A
B Bria	Bria 3.2	$N/A	$N/A
B Bria	Bria 3.2 Vector	$N/A	$N/A
B Bria	Bria Enhance	$N/A	$N/A
B Bria	Bria Erase	$N/A	$N/A
B Bria	Erase Foreground	$N/A	$N/A
B Bria	Bria Expand	$N/A	$N/A
B Bria	Bria/fibo	$N/A	$N/A
B Bria	Bria Fibo Edit	$N/A	$N/A
B Bria	Gen Fill	$N/A	$N/A
B Bria	Bria Remove Background	$N/A	$N/A
B Bria	Replace Background	$N/A	$N/A
BY ByteDance Seed	Seedream 4.0	$N/A	$N/A
BY ByteDance Seed	Seedream 4.5	$N/A	$N/A
C Clarityai	ClarityAI/creative	$N/A	$N/A
C Clarityai	ClarityAI/crystal	$N/A	$N/A
C Clarityai	Flux	$N/A	$N/A
DS Deepseek	Janus Pro 1B	$N/A	$N/A
DS Deepseek	Janus Pro 7B	$N/A	$N/A
DI DeepInfra	Pixverse/Pixverse-T2V	$N/A	$N/A
DI DeepInfra	Pixverse/Pixverse-T2V-HD	$N/A	$N/A
P Prunaai	PrunaAI/p-image	$N/A	$N/A
P Prunaai	PrunaAI/p-image	$N/A	$N/A
DI DeepInfra	Qwen/Qwen3-TTS	$N/A	$N/A
DI DeepInfra	Qwen/Qwen3-TTS-VoiceDesign	$N/A	$N/A
QW Qwen	Qwen Image Edit	$N/A	$N/A
QW Qwen	Qwen Image Edit Max	$N/A	$N/A
QW Qwen	Qwen Image Max	$N/A	$N/A
DI DeepInfra	run-diffusion/Juggernaut-Flux	$N/A	$N/A
R RunDiffusion	Juggernaut Lightning Flux	$N/A	$N/A
DI DeepInfra	stabilityai/sd3.5	$N/A	$N/A
DI DeepInfra	stabilityai/sd3.5-medium	$N/A	$N/A
S Stability Ai	stabilityai/sdxl-turbo	$N/A	$N/A
DI DeepInfra	Wan-AI/Wan2.1-T2V-1.3B	$N/A	$N/A
DI DeepInfra	Wan-AI/Wan2.1-T2V-14B	$N/A	$N/A
W Wan Ai	Wan-AI/Wan2.6-Image-Edit	$N/A	$N/A
W Wan Ai	Wan2.6 T2I	$N/A	$N/A
DI DeepInfra	Wan-AI/Wan2.7-Image-Edit	$N/A	$N/A

Pricing from DeepInfra. Prices per 1M tokens.

About DeepInfra

DeepInfra runs serverless inference for open-source models — Llama, Qwen, DeepSeek, GLM, Gemma and more — at some of the lowest per-token prices on the market. It's a pure pay-as-you-go API: you're billed only for input and output tokens, with no idle/uptime charges and an OpenAI-compatible endpoint, so switching is usually a base-URL change.

Beyond serverless LLMs, DeepInfra also offers dedicated GPU instances for high-throughput or private workloads, plus embedding and image models. For most teams, the serverless per-token tier below is the cheapest way to run open-weight models in production.

DeepInfra Pricing FAQ

How much does DeepInfra cost?

DeepInfra charges per token, with input prices starting around a few cents per million tokens for smaller open models. See the live table above for current per-model input and output prices.

Is DeepInfra cheaper than OpenAI or Together?

For open-weight models (Llama, Qwen, DeepSeek, etc.) DeepInfra is usually among the cheapest serverless options. Use the comparison links below to see DeepInfra vs other providers on the same models.

Does DeepInfra charge for idle time?

No. Serverless inference is pure pay-per-token — you pay only for the input and output tokens you actually use. Dedicated GPU instances are billed separately by the hour.

Which models does DeepInfra support?

A broad catalog of open-source LLMs plus embedding and image models. The table above lists the models DeepInfra currently serves with public per-token pricing.