DeepInfra Pricing
Per-token pricing for 72 open-source LLMs on DeepInfra — one of the cheapest serverless inference providers. Pay only for the tokens you use, with no minimums and no charge for idle time.
Last updated: Jun 28, 2026
DeepInfra free tier & credits
No ongoing free tier — see credits & cheapest models — see the full DeepInfra free tier guide
Provider | Model | Input/1M | Output/1M |
|---|---|---|---|
Z | $0.060 | $0.400 | |
$0.070 | $0.340 | ||
NV | $0.085 | $0.400 | |
$0.090 | $0.300 | ||
DS | $0.100 | $0.200 | |
$0.130 | $0.380 | ||
MM | $0.150 | $1.150 | |
QW | $0.150 | $0.950 | |
NV | $0.200 | $0.800 | |
DS | $0.260 | $0.380 | |
$0.400 | $2.000 | ||
$0.450 | $2.250 | ||
QW | $0.450 | $3.000 | |
nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B | $0.500 | $2.200 | |
Z | $0.600 | $2.080 | |
$0.740 | $3.500 | ||
$0.750 | $3.500 | ||
Z | $0.950 | $3.000 | |
$1.000 | $3.000 | ||
Z | $1.050 | $3.500 | |
QW | $1.200 | $6.000 | |
QW | $1.200 | $6.000 | |
DS | $1.300 | $2.600 | |
FLUX 1.1 Pro | $N/A | $N/A | |
FLUX.1 Dev | $N/A | $N/A | |
FLUX.1 Kontext Dev | $N/A | $N/A | |
FLUX.1 Redux Dev | $N/A | $N/A | |
FLUX.1 Schnell | $N/A | $N/A | |
FLUX.2 Dev | $N/A | $N/A | |
FLUX.2 Klein 4B | $N/A | $N/A | |
FLUX.2 Klein 9B | $N/A | $N/A | |
FLUX.2 Max | $N/A | $N/A | |
FLUX.2 Pro | $N/A | $N/A | |
FLUX.1 Pro | $N/A | $N/A | |
B | Bria Blur Background | $N/A | $N/A |
B | Bria 3.2 | $N/A | $N/A |
B | Bria 3.2 Vector | $N/A | $N/A |
B | Bria Enhance | $N/A | $N/A |
B | Bria Erase | $N/A | $N/A |
B | Erase Foreground | $N/A | $N/A |
B | Bria Expand | $N/A | $N/A |
B | Bria/fibo | $N/A | $N/A |
B | Bria Fibo Edit | $N/A | $N/A |
B | Gen Fill | $N/A | $N/A |
B | Bria Remove Background | $N/A | $N/A |
B | Replace Background | $N/A | $N/A |
Seedream 4.0 | $N/A | $N/A | |
Seedream 4.5 | $N/A | $N/A | |
ClarityAI/creative | $N/A | $N/A | |
ClarityAI/crystal | $N/A | $N/A | |
Flux | $N/A | $N/A | |
DS | Janus Pro 1B | $N/A | $N/A |
DS | Janus Pro 7B | $N/A | $N/A |
Pixverse/Pixverse-T2V | $N/A | $N/A | |
Pixverse/Pixverse-T2V-HD | $N/A | $N/A | |
PrunaAI/p-image | $N/A | $N/A | |
PrunaAI/p-image | $N/A | $N/A | |
Qwen/Qwen3-TTS | $N/A | $N/A | |
Qwen/Qwen3-TTS-VoiceDesign | $N/A | $N/A | |
QW | Qwen Image Edit | $N/A | $N/A |
QW | Qwen Image Edit Max | $N/A | $N/A |
QW | Qwen Image Max | $N/A | $N/A |
run-diffusion/Juggernaut-Flux | $N/A | $N/A | |
Juggernaut Lightning Flux | $N/A | $N/A | |
stabilityai/sd3.5 | $N/A | $N/A | |
stabilityai/sd3.5-medium | $N/A | $N/A | |
stabilityai/sdxl-turbo | $N/A | $N/A | |
Wan-AI/Wan2.1-T2V-1.3B | $N/A | $N/A | |
Wan-AI/Wan2.1-T2V-14B | $N/A | $N/A | |
Wan-AI/Wan2.6-Image-Edit | $N/A | $N/A | |
Wan2.6 T2I | $N/A | $N/A | |
Wan-AI/Wan2.7-Image-Edit | $N/A | $N/A |
Pricing from DeepInfra. Prices per 1M tokens.
About DeepInfra
DeepInfra runs serverless inference for open-source models — Llama, Qwen, DeepSeek, GLM, Gemma and more — at some of the lowest per-token prices on the market. It's a pure pay-as-you-go API: you're billed only for input and output tokens, with no idle/uptime charges and an OpenAI-compatible endpoint, so switching is usually a base-URL change.
Beyond serverless LLMs, DeepInfra also offers dedicated GPU instances for high-throughput or private workloads, plus embedding and image models. For most teams, the serverless per-token tier below is the cheapest way to run open-weight models in production.
DeepInfra Pricing FAQ
How much does DeepInfra cost?
DeepInfra charges per token, with input prices starting around a few cents per million tokens for smaller open models. See the live table above for current per-model input and output prices.
Is DeepInfra cheaper than OpenAI or Together?
For open-weight models (Llama, Qwen, DeepSeek, etc.) DeepInfra is usually among the cheapest serverless options. Use the comparison links below to see DeepInfra vs other providers on the same models.
Does DeepInfra charge for idle time?
No. Serverless inference is pure pay-per-token — you pay only for the input and output tokens you actually use. Dedicated GPU instances are billed separately by the hour.
Which models does DeepInfra support?
A broad catalog of open-source LLMs plus embedding and image models. The table above lists the models DeepInfra currently serves with public per-token pricing.
