VOOZH about

URL: https://willitrunai.com/models/nemotron-cascade-2-30b-a3b

โ‡ฑ Nemotron Cascade 2 30B A3B VRAM Requirements โ€” GPU Compatibility


๐Ÿ‘ NVIDIA
NVIDIA

Nemotron Cascade 2 30B A3B

Frontier
๐Ÿ‘ huggingface
HuggingFace๐Ÿ‘ ollama
Ollama
Mar 2026Released262K tokensContextNVIDIA Open Model LicenseLicense88 StrongQuality

Nemotron Cascade 2 30B A3B (30B parameters) requires approximately 23.0 GB of VRAM with Q4_K_M quantization. As a Mixture of Experts model with 3B active parameters, it uses less memory than its total parameter count suggests. For the best balance of quality and speed, we recommend hardware with at least 27 GB of VRAM.

Get started

โ€” copy & paste to run locally

Copy-paste commands to run Nemotron Cascade 2 30B A3B on your machine.

Run

ollama run nemotron-cascade-2

Quick specs

Parameters30B (3B active)
Architecturemoe (MoE)
Context262K tokens
Modalitytext
Min RAM11.7 GB
Rec. RAM18.3 GB (Q4_K_M)
LicenseNVIDIA Open Model License
FamilyNemotron
โœ“ Codeโœ“ Chatโœ“ Reasoning

About this model

NVIDIA Nemotron Cascade 2 is a 30B MoE model with 3B active parameters, using a Mamba-2 + Transformer hybrid architecture. Gold medal at IMO 2025 and IOI 2025. 92% AIME 2025, 87% LiveCodeBench. Fits on a single RTX 4090.

  • โ€ขMoE: 30B total / 3B active โ€” runs on single RTX 4090
  • โ€ขMamba-2 + Transformer hybrid architecture
  • โ€ขGold medal IMO 2025, IOI 2025, ICPC 2025
  • โ€ข92.4% AIME 2025, 87.2% LiveCodeBench v6
  • โ€ข262K context window
  • โ€ขThinking + instruct dual mode

Related models

Your hardware

Detecting...

Quick picks

Best budgetS
Mac mini M4 64GB~$1,099 โ€” 13 tok/s
๐Ÿ‘ NVIDIA
Best overallS
RTX 5090 32GB~$1,999 โ€” 186 tok/s

Best hardware

Top picks for Nemotron Cascade 2 30B A3B

RTX 5090 32GBS
32 GB
RTX PRO 4500 Blackwell 32GBS
32 GB
NVIDIA V100 32GBS
32 GB
AMD Instinct MI100 32GBS
32 GB
AMD Instinct MI60 32GBS
32 GB

Run this model

Nemotron Cascade 2 30B A3B on RTX 5090 32GBNemotron Cascade 2 30B A3B on RTX PRO 4500 Blackwell 32GBNemotron Cascade 2 30B A3B on NVIDIA V100 32GB

Quantization options

VRAM estimates by quant level

No hardware detected โ€” fit column shows raw VRAM estimates

QuantBitsVRAMQualityFit
Q2_K
2
11.7 GB
Lowโ€”
Q3_K_S
3
14.7 GB
Lowโ€”
NVFP4
4
16.8 GB
Mediumโ€”
Q4_K_M
4
18.3 GB
Mediumโ€”
Q5_K_M
5
21.6 GB
Highโ€”
Q6_K
6
24.6 GB
Highโ€”
Q8_0
8
32.1 GB
Very Highโ€”
F16
16
61.5 GB
Maximumโ€”

Quality benchmarks

Nemotron Cascade 2 30B A3B benchmark scores

Benchmark verified

Coding

SWE-bench Verifiedโ€”
HumanEval+โ€”
Aider Polyglotโ€”
LiveCodeBench87.2%

Reasoning

MMLU-Pro79.8%
GPQA Diamond76.1%
MATH-500โ€”
ARC Challengeโ€”

Source: official ยท 2026-03-19

Hardware compatibility

Fit estimates across all hardware

Open calculator

Computing compatibility...

Memory breakdown

Reference: RTX 2060 6GB

Weights18.3 GB
KV Cache2.9 GB
Runtime1.2 GB
Headroom0.6 GB

Frequently asked questions

FAQ โ€” Nemotron Cascade 2 30B A3B

See also

Quantization GuideScoring MethodologyVRAM Calculator