Voozh

Short answer: 24 GB VRAM is the 2026 sweet spot. 16 GB still runs a great lineup, 32 GB adds headroom for Q5/Q6 and 1M-context Qwen 3.6 workloads.

This guide gives the exact local LLMs that fit on 16 GB, 24 GB, and 32 GB VRAM in April 2026 — per-tier top picks, tokens/second on common GPUs, coding vs chat picks, and when upgrading is actually worth it.

Need a fit check for a specific GPU? Use the VRAM Calculator. For a ranked list against a specific Apple Silicon Mac, see Best Local LLMs for MacBook Air M4 24GB or MacBook Pro M4 Pro 24GB.

16 GB VRAM — Mid-range (RTX 4060 Ti 16GB, RTX 4070 Ti 16GB, RTX 5070, RTX 4080 16GB, RX 7900 GRE)

The sweet spot for dense 9-27B models. Qwen3.6-27B (released April 22, 2026) fits at Q4_K_M (16.8 GB) and is the best model at this tier.

Use case	Model	VRAM Q4	Best quant	tok/s (RTX 4070 Ti)
Best overall (NEW)	Qwen3.6-27B	16.8 GB	Q4_K_M	~38
Best coding / agentic	Qwen3.6-27B	16.8 GB	Q4_K_M	~38
General chat	Qwen 3.5 9B	~5.1 GB	Q8_0 (~10 GB)	~70
Coding (small / fast)	Qwen 3 Coder 14B	~8.3 GB	Q6_K (~12 GB)	~50
Previous-gen dense	Qwen 3.5 27B	~16 GB	Q4_K_M tight	~38
Instruction-follow	Llama 3.1 8B	~4.6 GB	Q8_0 (~8 GB)	~80
Reasoning / Math	DeepSeek R1 Distill 14B	~8 GB	Q5_K_M	~60

Does NOT fit at useful quant: Qwen3.6-35B-A3B MoE (~21 GB), Qwen 3.5 35B-A3B (~21 GB), DeepSeek R1 32B full, any Llama 4 variant.

Upgrade trigger: If you want MoE efficiency or long-context agentic workloads, jump to 24 GB.

24 GB VRAM — Enthusiast (RTX 4090, RTX 3090, RTX 5090 32GB, RX 7900 XTX, Mac M4 Pro 24GB)

The 2026 sweet spot. Handles the best MoE models, top dense 27-32B, long 262K context.

Use case	Model	VRAM Q4	Best quant	tok/s (RTX 4090)
Best coding / flagship (NEW)	Qwen3.6-27B dense	16.8 GB	Q6_K (22.5 GB)	~60
Best MoE throughput	Qwen3.6-35B-A3B	~21 GB	Q4_K_M	~70
Best coding (prev-gen)	Qwen 3 Coder 30B-A3B	~17 GB	Q4_K_M	~75
Dense reasoning	Qwen 3 32B	~19 GB	Q4_K_M	~55
Prev-gen MoE	Qwen 3.5 35B-A3B	~21 GB	Q4_K_M	~70
Math / Code	DeepSeek R1 Distill 32B	~19 GB	Q4_K_M	~50

Does NOT fit at useful quant: Llama 4 Maverick (requires 128GB), DeepSeek V3 full, Qwen 3.5 122B-A10B (needs 80GB).

Upgrade trigger: If you need Q6/Q8 on 35B-A3B (for coding precision) or 1M-context workflows, jump to 32 GB or Mac 36-64 GB.

32 GB VRAM — High-end consumer (RTX 5090 32GB)

Q5/Q6 on 35B-A3B, partial offload to Llama 4 Scout, Qwen 3.6 1M context.

Use case	Model	VRAM	Best quant	tok/s (RTX 5090)
Best coding (NEW)	Qwen3.6-27B dense	~28.6 GB	Q8_0	~85
Best MoE	Qwen3.6-35B-A3B	~25 GB	Q5_K_M	~90
Best prev-gen coding	Qwen 3 Coder 30B-A3B	~20 GB	Q6_K	~85
Long context	Qwen 3.5 27B (128K)	~18 GB	Q8_0	~55
Dense reasoning	Qwen 3 32B	~23 GB	Q5_K_M	~60
Partial offload	Llama 4 Scout 109B	~28 GB (partial)	Q4 offload	~15
1M-context (Qwen 3.6)	Qwen3.6-35B-A3B	~21-40 GB	Q4-Q5 YaRN	~90

Upgrade trigger: For 35B-A3B at Q8 (effectively FP16 quality) or multi-model concurrent use, go to 48 GB+ (RTX A6000, Mac M4 Max 64GB).

Apple Silicon unified memory equivalents

Unified memory behaves differently: macOS reserves ~15-25% for system. Effective "LLM headroom":

Mac config	Effective LLM RAM	Closest GPU tier
MacBook Air M4 16GB	~12 GB	RTX 4060 Ti 16GB
MacBook Air M4 24GB	~19 GB	between 16 and 24 GB tiers
MacBook Pro M4 24GB	~19 GB	between 16 and 24 GB
MacBook Pro M4 Pro 24GB	~20 GB	24 GB class (higher bandwidth)
MacBook Pro M4 Max 36GB	~30 GB	32 GB class
MacBook Pro M4 Max 48GB	~40 GB	32-48 GB class
MacBook Pro M4 Max 64GB	~54 GB	48 GB class
Mac Studio M3 Ultra 96GB	~80 GB	workstation

See MacBook Air M4 vs Pro M4 for Local LLMs for the full decision guide.

Expected tokens per second (Q4_K_M)

Model	RTX 4060 Ti 16GB	RTX 4090 24GB	RTX 5090 32GB	Mac M4 Pro 24GB	Mac M4 Max 64GB
Qwen 3 8B	50	85	110	40	45
Qwen 3.5 9B	50	85	110	40	45
Qwen 3 14B	35	55	70	22	28
Qwen 3.5 27B	—	35	45	18	24
Qwen 3 30B-A3B MoE	—	70	90	34	42
Qwen 3.5 35B-A3B	—	65	85	30	40
DeepSeek R1 Distill 32B	—	45	60	20	26

Decision framework

You type slowly or mostly chat: 16 GB is fine, stick with Qwen 3.5 9B Q8.
You code all day: 24 GB (RTX 4090/3090) + Qwen 3 Coder 30B-A3B. Best ROI.
You want MoE + long context: 32 GB RTX 5090 or Mac M4 Max 36-48 GB.
You run a team workstation: 48-64 GB (Mac Studio / Mac Pro M4 Max) for Q8 on 35B-A3B.
You run API for multiple users: skip consumer GPUs; go H100 80GB or datacenter multi-GPU (see Multi-GPU LLM Inference Guide).

Related guides

Qwen 3.6 VRAM & Release Date — latest flagship MoE
Qwen3.6-35B-A3B Hardware Requirements (Buyer Guide)
Best Local Coding LLMs for Apple Silicon 24GB
Best GPU for Running LLMs Locally (2026)
Best Local LLMs by VRAM Tier — 11 tiers ranked
VRAM Calculator — check any combo

URL: https://willitrunai.com/blog/what-can-you-run-on-16gb-24gb-32gb-vram

⇱ What Can You Run on 16GB, 24GB, 32GB VRAM? — Local LLM Guide (April 2026) | Will It Run AI Blog