VOOZH about

URL: https://deepinfra.com/models/priority

⇱ Models | Machine Learning Inference | DeepInfra


We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

DeepInfra raises $107M Series B to scale the inference cloud β€” read the announcement

Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:

​
featured
GLM-5.2

GLM-5.2 is Z-AI's latest flagship model for long-horizon tasks. It marks a substantial leap in long-horizon task capability over its predecessor GLM-5.1 and, for the first time, delivers that capability on a **solid 1M-token context**.
Priority
fp4
1024k
$0.18 cached, $0.95 in, $3.00 out / 1M
featured
Kimi-K2.7-Code

Kimi K2.7 Code is a coding-focused agentic model built upon Kimi K2.6. With substantial improvements on real-world long-horizon coding tasks, it strengthens end-to-end task completion across complex software engineering workflows while improving token efficiency, reducing thinking-token usage by approximately 30% compared with Kimi K2.6.
Priority
fp4
256k
$0.15 cached, $0.74 in, $3.50 out / 1M
featured
NVIDIA-Nemotron-3-Ultra-550B-A55B

Nemotron 3 Ultra is built for, frontier reasoning, orchestration, coding agents, deep research, and complex enterprise workflows. It delivers up to 5x faster inference and up to 30% lower cost for agentic workloads while supporting up to 1M token context.
Priority
256k
$0.10 cached, $0.50 in, $2.20 out / 1M
featured
Nemotron-3-Nano-Omni-30B-A3B-Reasoning

Nemotron 3 Nano Omni is an open multimodal model built on a hybrid Mixture-of-Experts (MoE) architecture, engineered for high efficiency and strong accuracy across image, video, audio, and text inputs. It powers always-on sub-agents for computer use, document intelligence, and audio-video understandingβ€”replacing fragmented vision, speech, and language pipelines with a single unified inference pass.
Priority
bfloat16
256k
$0.20 in, $0.80 out / 1M
featured
DeepSeek-V4-Flash

DeepSeek V4 Flash is an efficiency-focused MoE model with 284B total parameters (13B active) and a 1M-token context window. It's tuned for fast inference and high-throughput use cases while still holding up on reasoning and coding tasks.
Priority
fp4
1024k
$0.02 cached, $0.10 in, $0.20 out / 1M
featured
DeepSeek-V4-Pro

DeepSeek V4 Pro is an MoE model with 1.6T total parameters (49B active) and a 1M-token context window. It's built for advanced reasoning, coding, and long-running agent tasks, and performs well on knowledge, math, and software engineering benchmarks.
Priority
fp4
1024k
$0.10 cached, $1.30 in, $2.60 out / 1M
featured
Kimi-K2.6

Kimi K2.6 is an open-source, native multimodal agentic model that advances practical capabilities in long-horizon coding, coding-driven design, proactive autonomous execution, and swarm-based task orchestration.
Priority
fp4
256k
$0.15 cached, $0.75 in, $3.50 out / 1M
featured
MiMo-V2.5

MiMo-V2.5 is a native omnimodal model with strong agentic capabilities, supporting text, image, video, and audio understanding within a unified architecture. Built upon the MiMo-V2-Flash backbone and extended with dedicated vision and audio encoders, it delivers robust performance across multimodal perception, long-context reasoning, and agentic workflows.
Priority
256k
$0.08 cached, $0.40 in, $2.00 out / 1M
featured
MiMo-V2.5-Pro

MiMo-V2.5-Pro is an open-source Mixture-of-Experts (MoE) language model with 1.02T total parameters and 42B active parameters. It utilizes the hybrid attention architecture and 3-layers Multi-Token Prediction (MTP) introduced in [MiMo-V2-Flash](https://github.com/XiaomiMiMo/MiMo-V2-Flash).
Priority
fp8
1024k
$0.20 cached, $1.00 in, $3.00 out / 1M
featured
Qwen3.6-35B-A3B

Qwen3.6-35B-A3B is Alibaba's latest flagship Mixture-of-Experts model, with 35B total parameters and only 3B activated per token (256 experts, 8 routed + 1 shared). Built on direct feedback from the community, Qwen3.6 prioritizes stability and real-world utility, offering developers a more intuitive, responsive, and genuinely productive coding experience.
Priority
fp8
256k
$0.15 in, $0.95 out / 1M
featured
Qwen3.5-397B-A17B

Qwen3.5-397B-A17B is Alibaba's most capable Qwen3.5 model, a Mixture-of-Experts architecture with 397B total parameters and 17B activated per token. It features a 262K token context window (extensible to 1M with YaRN), thinking/reasoning mode, tool calling with MCP integration, and support for 201 languages. Sets state-of-the-art results on reasoning, coding, math, and multimodal benchmarks.
Priority
fp8
256k
$0.22 cached, $0.45 in, $3.00 out / 1M
featured
gemma-4-26B-A4B-it

Efficient, MoE variant of Gemma 4. Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input and generating text output.
Priority
fp8
256k
$0.07 in, $0.34 out / 1M
featured
gemma-4-31B-it

Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input and generating text output.
Priority
fp8
256k
$0.13 in, $0.38 out / 1M
featured
NVIDIA-Nemotron-3-Super-120B-A12B

NVIDIA Nemotron 3 Super is a hybrid Mixture-of-Experts (MoE) model engineered for highest compute efficiency and accuracy in multi-agent applications and specialized agentic systems. It is optimized to run many collaborating agents per application on a single GPU, delivering high accuracy for reasoning, tool use, and instruction following.
Priority
bfloat16
256k
$0.085 in, $0.40 out / 1M
featured
MiniMax-M2.5

MiniMax M2.5 is SOTA in coding, agentic tool use and search, office work, and a range of other economically valuable tasks, boasting scores of 80.2% in SWE-Bench Verified, 51.3% in Multi-SWE-Bench, and 76.3% in BrowseComp (with context management).
Priority
fp8
192k
$0.03 cached, $0.15 in, $1.15 out / 1M
featured
Kimi-K2.5

Kimi K2.5 is an open-source, native multimodal agentic model built through continual pretraining on approximately 15 trillion mixed visual and text tokens atop Kimi-K2-Base. It seamlessly integrates vision and language understanding with advanced agentic capabilities, instant and thinking modes, as well as conversational and agentic paradigms.
Priority
fp4
256k
$0.07 cached, $0.45 in, $2.25 out / 1M
Hermes-3-Llama-3.1-70B

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board.
Priority
fp8
128k
$0.70 / 1M tokens
Qwen3-14B

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support.
Priority
fp8
40k
$0.12 in, $0.24 out / 1M
Qwen3-235B-A22B-Instruct-2507

Qwen3-235B-A22B-Instruct-2507 is the updated version of the Qwen3-235B-A22B non-thinking mode, featuring Significant improvements in general capabilities, including instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage.
Priority
fp8
256k
$0.09 in, $0.10 out / 1M
Qwen3-235B-A22B-Thinking-2507

Qwen3-235B-A22B-Thinking-2507 is the Qwen3's new model with scaling the thinking capability of Qwen3-235B-A22B, improving both the quality and depth of reasoning.
Priority
fp8
256k
$0.20 cached, $0.23 in, $2.30 out / 1M
Qwen3-30B-A3B

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support
Priority
fp8
40k
$0.12 in, $0.50 out / 1M
Qwen3-32B

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support
Priority
fp8
40k
$0.08 in, $0.28 out / 1M
Qwen3-Next-80B-A3B-Instruct

Over the past few months, we have observed increasingly clear trends toward scaling both total parameters and context lengths in the pursuit of more powerful and agentic artificial intelligence (AI). We are excited to share our latest advancements in addressing these demands, centered on improving scaling efficiency through innovative model architecture. We call this next-generation foundation models Qwen3-Next.
Priority
fp8
256k
$0.09 in, $1.10 out / 1M
Qwen3-VL-235B-A22B-Instruct

Meet Qwen3-VL β€” the most powerful vision-language model in the Qwen series to date. This generation delivers comprehensive upgrades across the board: superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities.
Priority
fp8
256k
$0.11 cached, $0.20 in, $0.88 out / 1M
πŸ‘ Built With Love in Palo Alto

Β© 2026 DeepInfra. All rights reserved.