VOOZH about

URL: https://willitrunai.com/browse?provider=DeepSeek

⇱ Browse 300+ AI Models for Local Inference | Will It Run AI


Browse AI Models

17 models available

Not sure what fits your GPU?Auto-detect your hardware →·Manual VRAM calculator →·Build recommender by budget →
Best for:4GB6GB8GB12GB16GB24GB48GB
Status:
Sort:
Filtered by:
👁 DeepSeek
DeepSeekDeepSeek V4 Pro
1600B (49B active)1.0M ctx976 GBfrontier
moeTop tier

DeepSeek V4 Pro is a 1.6T-parameter sparse MoE (49B active, 384 routed + 1 shared expert) built for million-token agentic reasoning. Experts ship natively in FP4, so the real on-disk footprint is roughly 862 GB (FP4 experts + FP8 attention) rather than the trillion-scale FP16 size — but it is still a server/workstation deployment: realistic local use targets 8x 80GB GPUs or 1 TB+ unified memory, and at long Think Max contexts the KV cache dominates.

👁 DeepSeek
DeepSeekDeepSeek V4 Flash
284B (13B active)1.0M ctx173.2 GBfrontier
moeTop tier

DeepSeek V4 Flash is the lighter 284B-parameter sparse MoE sibling of V4 Pro (13B active, 256 routed + 1 shared expert) with the same 1M-token context. Experts ship natively in FP4, so the real on-disk footprint is roughly 158 GB rather than the FP16 size — it fits a single 192 GB unified-memory machine or a 2-4 GPU server while keeping near-frontier reasoning and coding quality.

👁 DeepSeek
DeepSeekDeepSeek V3.2
671B (37B active)128K ctx409.3 GBfrontier
moeTop tier

DeepSeek V3.2 is a 671B MoE model with 37B active parameters per token, using DeepSeek Sparse Attention and Multi-head Latent Attention. 128K context window. MIT licensed. Requires multi-GPU or high-memory Macs for local inference.

👁 DeepSeek
DeepSeekDeepSeek Coder V2 236B
236B (21B active)131K ctx144 GBcurrent
moeHigh

We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks.

👁 DeepSeek
DeepSeekDeepSeek R1 671B
671B (37B active)131K ctx409.3 GBfrontier
moeHigh

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing.

👁 DeepSeek
DeepSeekDeepSeek V3.1 671B
671B (37B active)131K ctx409.3 GBfrontier
moeHigh

DeepSeek V3.1 (V3-0324) is a major update to the DeepSeek V3 family, with substantial improvements in instruction following, coding, creative writing, and agentic capabilities.

👁 DeepSeek
DeepSeekDeepSeek V2.5 236B
236B (21B active)131K ctx144 GBcurrent
moeHigh

DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. The new model integrates the general and coding abilities of the two previous versions. For model details, please visit DeepSeek-V2 page for more information.

👁 DeepSeek
DeepSeekDeepSeek V3 671B
671B (37B active)131K ctx409.3 GBcurrent
moeHigh

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance.

👁 DeepSeek
DeepSeekDeepSeek Coder V2 16B
16B (2.4B active)131K ctx9.8 GBcurrent
moeMid

We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks.

👁 DeepSeek
DeepSeekDeepSeek R1 Distill 70B
70B131K ctx42.7 GBfrontier
denseMid

DeepSeek R1 Distill 70B is a distilled reasoning model based on Llama 70B, offering strong chain-of-thought reasoning at a practical size.

👁 DeepSeek
DeepSeekDeepSeek R1 Distill 32B
32B33K ctx19.5 GBfrontier
denseMid

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing.

👁 DeepSeek
DeepSeekDeepSeek R1 Distill 14B
14B33K ctx8.5 GBfrontier
denseMid

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing.

👁 DeepSeek
DeepSeekDeepSeek R1 Distill 7B
7B33K ctx4.3 GBactive
denseBudget

DeepSeek R1 Distill Qwen 7B is a 7B-parameter reasoning model distilled from the larger DeepSeek-R1. Based on Qwen2.5-Math-7B and fine-tuned on 800K samples from DeepSeek-R1, it delivers strong reasoning with 92.8% on MATH-500 and 49.1 on GPQA Diamond while being far more efficient than the full 671B model.

👁 DeepSeek
DeepSeekDeepSeek R1 Distill 8B
8B33K ctx4.9 GBfrontier
denseBudget

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing.

👁 DeepSeek
DeepSeekDeepSeek R1 1.5B
1.5B33K ctx0.9 GBactive
denseLegacy

DeepSeek R1 Distill Qwen 1.5B is a compact reasoning model distilled from DeepSeek-R1, based on Qwen2.5-Math-1.5B. Fine-tuned on 800K curated samples, it achieves 83.9% on MATH-500 and supports chain-of-thought reasoning on resource-constrained devices.

👁 DeepSeek
DeepSeekDeepSeek LLM 67B
67B4K ctx40.9 GBlegacy
denseLegacy

Introducing DeepSeek LLM, an advanced language model comprising 67 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. In order to foster research, we have made DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat open source for the research community.

👁 DeepSeek
DeepSeekDeepSeek LLM 7B
7B4K ctx4.3 GBlegacy
denseLegacy

Introducing DeepSeek LLM, an advanced language model comprising 7 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. In order to foster research, we have made DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat open source for the research community.