๐ DeepSeek
DeepSeek
DeepSeek V4 Flash
FrontierDeepSeek V4 Flash (284B parameters) requires approximately 160.8 GB of VRAM with NVFP4 quantization. As a Mixture of Experts model with 13B active parameters, it uses less memory than its total parameter count suggests. For the best balance of quality and speed, we recommend hardware with at least 185 GB of VRAM.
Get started
โ copy & paste to run locallyCopy-paste commands to run DeepSeek V4 Flash on your machine.
Run
docker run --rm -it ghcr.io/ggerganov/llama.cpp:full \
--hf-repo "deepseek-ai/DeepSeek-V4-Flash" \
--hf-file "DeepSeek-V4-Flash-NVFP4.gguf" \
-c 4096 -ngl 99Quick specs
About this model
- โข284B total / 13B active sparse MoE โ 256 routed + 1 shared expert
- โขNative FP4 experts: ~158 GB on disk
- โข1M-token context with near-frontier coding quality
- โขRuns on a single 192 GB unified-memory box or a small GPU server
Related models
Your hardware
Detecting...
Quick picks
Best hardware
Top picks for DeepSeek V4 Flash
Run this model
Quantization options
VRAM estimates by quant level
No hardware detected โ fit column shows raw VRAM estimates
| Quant | Bits | VRAM | Quality | Fit |
|---|---|---|---|---|
Q2_K | 2 | 110.8 GB | Low | โ |
Q3_K_S | 3 | 139.2 GB | Low | โ |
NVFP4 | 4 | 159.0 GB | Medium | โ |
Q4_K_M | 4 | 173.2 GB | Medium | โ |
Q5_K_M | 5 | 204.5 GB | High | โ |
Q6_K | 6 | 232.9 GB | High | โ |
Q8_0 | 8 | 303.9 GB | Very High | โ |
F16 | 16 | 582.2 GB | Maximum | โ |
Quality benchmarks
DeepSeek V4 Flash benchmark scores
Coding
Reasoning
Source: vendor-reported ยท 2026-04-24
Hardware compatibility
Fit estimates across all hardware
Computing compatibility...
Memory breakdown
Reference: RTX 2060 6GB
Frequently asked questions
FAQ โ DeepSeek V4 Flash
See also
