VOOZH about

URL: https://pricepertoken.com/endpoints/deepinfra

⇱ DeepInfra Pricing – Cheapest Serverless LLM Inference | Price Per Token


New: give your AI agents live LLM pricing & benchmarks with the Price Per Token MCP. Get the MCP

DeepInfra Pricing

Per-token pricing for 72 open-source LLMs on DeepInfra — one of the cheapest serverless inference providers. Pay only for the tokens you use, with no minimums and no charge for idle time.

Last updated: Jun 28, 2026

👁 The Grid | Spot Priced LLM API
The Grid | Spot Priced LLM APISponsored

Change 3 lines of code and let providers compete for your requests in real time.

Get started for free

Get our weekly newsletter on pricing changes, new releases, and tools.

DeepInfra free tier & credits

No ongoing free tier — see credits & cheapest models — see the full DeepInfra free tier guide

72
Models
$0.06
Cheapest Input/1M
$0.20
Cheapest Output/1M
17
Model Authors
Sponsored 👁 The Grid | Spot Priced LLM API
The Grid | Spot Priced LLM API — Why pay list price when providers will bid for your API usage? Up to 80% off.
View spot prices
Provider
Model
Input/1M
Output/1M
$0.060
$0.400
$0.070
$0.340
$0.085
$0.400
$0.090
$0.300
$0.100
$0.200
$0.130
$0.380
$0.150
$1.150
$0.150
$0.950
$0.200
$0.800
$0.260
$0.380
$0.400
$2.000
$0.450
$2.250
$0.450
$3.000
nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B
$0.500
$2.200
$0.600
$2.080
$0.740
$3.500
$0.750
$3.500
$0.950
$3.000
$1.000
$3.000
$1.050
$3.500
$1.200
$6.000
$1.200
$6.000
$1.300
$2.600
FLUX 1.1 Pro
$N/A
$N/A
FLUX.1 Dev
$N/A
$N/A
FLUX.1 Kontext Dev
$N/A
$N/A
FLUX.1 Redux Dev
$N/A
$N/A
FLUX.1 Schnell
$N/A
$N/A
FLUX.2 Dev
$N/A
$N/A
FLUX.2 Klein 4B
$N/A
$N/A
FLUX.2 Klein 9B
$N/A
$N/A
FLUX.2 Max
$N/A
$N/A
FLUX.2 Pro
$N/A
$N/A
FLUX.1 Pro
$N/A
$N/A
Bria Blur Background
$N/A
$N/A
Bria 3.2
$N/A
$N/A
Bria 3.2 Vector
$N/A
$N/A
Bria Enhance
$N/A
$N/A
Bria Erase
$N/A
$N/A
Erase Foreground
$N/A
$N/A
Bria Expand
$N/A
$N/A
Bria/fibo
$N/A
$N/A
Bria Fibo Edit
$N/A
$N/A
Gen Fill
$N/A
$N/A
Bria Remove Background
$N/A
$N/A
Replace Background
$N/A
$N/A
Seedream 4.0
$N/A
$N/A
Seedream 4.5
$N/A
$N/A
ClarityAI/creative
$N/A
$N/A
ClarityAI/crystal
$N/A
$N/A
Flux
$N/A
$N/A
Janus Pro 1B
$N/A
$N/A
Janus Pro 7B
$N/A
$N/A
Pixverse/Pixverse-T2V
$N/A
$N/A
Pixverse/Pixverse-T2V-HD
$N/A
$N/A
PrunaAI/p-image
$N/A
$N/A
PrunaAI/p-image
$N/A
$N/A
Qwen/Qwen3-TTS
$N/A
$N/A
Qwen/Qwen3-TTS-VoiceDesign
$N/A
$N/A
Qwen Image Edit
$N/A
$N/A
Qwen Image Edit Max
$N/A
$N/A
Qwen Image Max
$N/A
$N/A
run-diffusion/Juggernaut-Flux
$N/A
$N/A
Juggernaut Lightning Flux
$N/A
$N/A
stabilityai/sd3.5
$N/A
$N/A
stabilityai/sd3.5-medium
$N/A
$N/A
stabilityai/sdxl-turbo
$N/A
$N/A
Wan-AI/Wan2.1-T2V-1.3B
$N/A
$N/A
Wan-AI/Wan2.1-T2V-14B
$N/A
$N/A
Wan-AI/Wan2.6-Image-Edit
$N/A
$N/A
Wan2.6 T2I
$N/A
$N/A
Wan-AI/Wan2.7-Image-Edit
$N/A
$N/A

Pricing from DeepInfra. Prices per 1M tokens.

About DeepInfra

DeepInfra runs serverless inference for open-source models — Llama, Qwen, DeepSeek, GLM, Gemma and more — at some of the lowest per-token prices on the market. It's a pure pay-as-you-go API: you're billed only for input and output tokens, with no idle/uptime charges and an OpenAI-compatible endpoint, so switching is usually a base-URL change.

Beyond serverless LLMs, DeepInfra also offers dedicated GPU instances for high-throughput or private workloads, plus embedding and image models. For most teams, the serverless per-token tier below is the cheapest way to run open-weight models in production.

DeepInfra Pricing FAQ

How much does DeepInfra cost?

DeepInfra charges per token, with input prices starting around a few cents per million tokens for smaller open models. See the live table above for current per-model input and output prices.

Is DeepInfra cheaper than OpenAI or Together?

For open-weight models (Llama, Qwen, DeepSeek, etc.) DeepInfra is usually among the cheapest serverless options. Use the comparison links below to see DeepInfra vs other providers on the same models.

Does DeepInfra charge for idle time?

No. Serverless inference is pure pay-per-token — you pay only for the input and output tokens you actually use. Dedicated GPU instances are billed separately by the hour.

Which models does DeepInfra support?

A broad catalog of open-source LLMs plus embedding and image models. The table above lists the models DeepInfra currently serves with public per-token pricing.

Compare DeepInfra with Other Providers

DeepInfra Free Tier
Free models, credits & limits
DeepInfra vs Groq
Compare pricing & models
DeepInfra vs Together AI
Compare pricing & models
DeepInfra vs Fireworks AI
Compare pricing & models
DeepInfra vs Cerebras
Compare pricing & models
DeepInfra vs SambaNova
Compare pricing & models
DeepInfra vs Nebius AI
Compare pricing & models
DeepInfra vs Cloudflare Workers AI
Compare pricing & models
DeepInfra vs AWS Bedrock
Compare pricing & models
DeepInfra vs Azure OpenAI
Compare pricing & models
DeepInfra vs Google AI Studio
Compare pricing & models
DeepInfra vs OpenRouter
Compare pricing & models
DeepInfra vs Novita AI
Compare pricing & models

Built by @aellman

Follow us:

2026 68 Ventures, LLC. All rights reserved.