Novita Pricing

Per-token pricing for 68 open-source LLMs on Novita AI — a low-cost serverless inference provider with a broad model catalog and per-model quantization options.

Last updated: Jun 28, 2026

👁 The Grid | Spot Priced LLM API

The Grid | Spot Priced LLM APISponsored

Change 3 lines of code and let providers compete for your requests in real time.

Get started for free

Get our weekly newsletter on pricing changes, new releases, and tools.

Novita offers free models

$0.50 — see the full Novita free tier guide

Models

$0.01

Cheapest Input/1M

$0.03

Cheapest Output/1M

Model Authors

Sponsored 👁 The Grid | Spot Priced LLM API

The Grid | Spot Priced LLM API — Why pay list price when providers will bid for your API usage? Up to 80% off.

View spot prices

Provider	Model	Context	Quant	Input/1M	Output/1M
I Inclusionai	Ling 2.6 Flash	262k	—	$0.010	$0.030
M Meta-llama	Llama 3.1 8B Instruct	16k	fp8	$0.020	$0.050
MI Mistral AI	Mistral Nemo	60k	fp8	$0.040	$0.170
O OpenAI	GPT-OSS-20b	131k	fp4	$0.040	$0.150
NV Nvidia	Nemotron 3 Nano 30B A3B	262k	fp4	$0.050	$0.200
O OpenAI	GPT-OSS-120b	131k	fp4	$0.050	$0.250
S1 Sao10k	Llama 3 8B Lunaris	8k	bf16	$0.050	$0.050
QW Qwen	Qwen3 Coder 30B A3B Instruct	160k	fp8	$0.070	$0.270
Z Z-ai	GLM-4.7-Flash	200k	bf16	$0.070	$0.400
NV Novita	inclusionAI: Ling-2.6-1T	262k	—	$0.075	$0.625
NV Novita	inclusionAI: Ring-2.6-1T	262k	—	$0.075	$0.625
QW Qwen	Qwen3 VL 8B Instruct	131k	fp8	$0.080	$0.500
QW Qwen	Qwen3 235B A22B Instruct 2507	131k	fp8	$0.090	$0.580
G Google	Gemma 3 27B	98k	bf16	$0.119	$0.200
G Google	Gemma 4 26B A4B Instruct	262k	bf16	$0.130	$0.400
Z Z-ai	GLM 4.5 Air	131k	bf16	$0.130	$0.850
M Meta-llama	Llama 3.3 70B Instruct	131k	bf16	$0.135	$0.400
DS Deepseek	DeepSeek V4 Flash (Non-Reasoning)	1049k	fp8	$0.140	$0.280
G Google	Gemma 4 31B Instruct	262k	bf16	$0.140	$0.400
QW Qwen	Qwen3 Next 80B A3B Instruct	131k	bf16	$0.150	$1.500
QW Qwen	Qwen3 Next 80B A3B Thinking	131k	bf16	$0.150	$1.500
M Meta-llama	Llama 4 Scout	131k	bf16	$0.180	$0.590
QW Qwen	Qwen3 Coder Next	262k	fp8	$0.200	$1.500
QW Qwen	Qwen3 VL 30B A3B Instruct	131k	bf16	$0.200	$0.700
QW Qwen	Qwen3 VL 30B A3B Thinking	131k	fp16	$0.200	$1.000
S StepFun	stepfun/step-3.7-flash	262k	—	$0.200	$1.150
DS Deepseek	DeepSeek V3.2	164k	fp8	$0.269	$0.400
DS Deepseek	DeepSeek V3 0324	164k	fp8	$0.270	$1.120
DS Deepseek	DeepSeek V3.1	131k	fp8	$0.270	$1.000
DS Deepseek	DeepSeek V3.1 Terminus	131k	fp8	$0.270	$1.000
DS Deepseek	DeepSeek V3.2 Exp	164k	fp8	$0.270	$0.410
M Meta-llama	Llama 4 Maverick	1049k	fp8	$0.270	$0.850
MM Minimax	MiniMax M2.7	205k	fp8	$0.270	$1.080
MM Minimax	MiniMax M2	205k	fp8	$0.300	$1.200
MM Minimax	MiniMax M2.1	205k	fp8	$0.300	$1.200
MM Minimax	MiniMax M2.5	205k	fp8	$0.300	$1.200
MM Minimax	MiniMax M3	1000k	—	$0.300	$1.200
QW Qwen	Qwen3 235B A22B Thinking 2507	131k	fp8	$0.300	$3.000
QW Qwen	Qwen3.5-27B	262k	bf16	$0.300	$2.400
QW Qwen	Qwen3 VL 235B A22B Instruct	131k	bf16	$0.300	$1.500
Z Z-ai	GLM 4.6V	131k	bf16	$0.300	$0.900
QW Qwen	Qwen2.5 72B Instruct	32k	bf16	$0.380	$0.400
QW Qwen	Qwen3 Coder 480B A35B (exacto)	262k	fp8	$0.380	$1.550
DS Deepseek	DeepSeek V3	64k	fp8	$0.400	$1.300
QW Qwen	Qwen3.5-122B-A10B	262k	bf16	$0.400	$3.200
BD Baidu	ERNIE 4.5 VL 424B A47B	123k	fp16	$0.420	$1.250
X Xiaomi	MiMo v2.5 Pro	1049k	—	$0.522	$1.044
Z Z-ai	GLM 4.7	205k	fp8	$0.540	$1.980
MM Minimax	MiniMax M1	1000k	bf16	$0.550	$2.200
Z Z-ai	GLM 4.6	205k	bf16	$0.550	$2.200
MS Moonshotai	Kimi K2 0711	131k	fp8	$0.570	$2.300
MS Moonshotai	Kimi K2.5	262k	—	$0.570	$2.850
MS Moonshotai	Kimi K2 0905 (exacto)	262k	fp8	$0.600	$2.500
MS Moonshotai	Kimi K2 Thinking	262k	bf16	$0.600	$2.500
QW Qwen	Qwen3.5 397B A17B	262k	—	$0.600	$3.600
Z Z-ai	GLM 4.5V	66k	fp8	$0.600	$1.800
MS Microsoft	WizardLM-2 8x22B	66k	bf16	$0.620	$0.620
DS Deepseek	R1	64k	fp8	$0.700	$2.500
DS Deepseek	R1 0528	164k	fp8	$0.700	$2.500
DS Deepseek	R1 Distill Llama 70B	8k	bf16	$0.800	$0.800
MS Moonshotai	Kimi K2.6	262k	—	$0.800	$3.400
MS Moonshotai	Kimi K2.7 Code	262k	—	$0.950	$4.000
QW Qwen	Qwen3 VL 235B A22B Thinking	131k	bf16	$0.980	$3.950
Z Z-ai	GLM 5	203k	fp8	$1.000	$3.200
Z Z-ai	GLM-5.2	1049k	fp8	$1.260	$3.960
Z Z-ai	GLM 5.1	205k	fp8	$1.380	$4.400
S1 Sao10k	Llama 3.1 Euryale 70B v2.2	8k	fp8	$1.480	$1.480
DS Deepseek	DeepSeek V4 Pro	1049k	fp8	$1.600	$3.200

Pricing for Novita endpoints aggregated via OpenRouter. Prices per 1M tokens.

About Novita AI

Novita AI is a serverless inference cloud offering cheap, pay-per-token access to a broad catalog of open-source models — Llama, Qwen, DeepSeek, GLM, gpt-oss and more. Many models are offered at multiple quantizations (fp8, fp16, fp4), letting you trade a little quality for lower cost. The API is OpenAI-compatible, so switching is usually a base-URL change.

Beyond LLMs, Novita also runs GPU instances and image/video models. For most teams the serverless per-token tier below is the cheapest way to run open-weight models, and one of the widest catalogs of any single provider.

Novita Pricing FAQ

How much does Novita cost?

Novita charges per token, with input prices starting around $0.01–$0.10 per million tokens for smaller open models. See the live table above for current per-model input and output prices.

What is quantization and why does it matter?

Quantization (fp8, fp4, int4) compresses a model so it runs cheaper and faster with a small quality trade-off. Novita serves many models at multiple quantizations — the table shows which quantization each price applies to.

Is Novita cheaper than other providers?

For open-weight models Novita is among the cheapest serverless options, especially at fp8/fp4 quantizations. Use the comparison links below to see Novita vs other providers on the same model.