Support this work → · X · GitHub · REAP paper · Cerebras REAP

Qwen3.5-35B-EXL3-4bpw

EXL3-4bpw quantization of Qwen/Qwen3.5-35B-A3B-Base.

At a glance


Base model	Qwen/Qwen3.5-35B-A3B-Base
Format	EXL3-4bpw
Total params	35B
Active / token	3B
Experts / layer	—
Layers	—
Hidden size	—
Context	—
On-disk size	21 GB

Which variant should I pick?

Variant	Format	Link
`Qwen3.5-264B`	BF16	link
`Qwen3.5-264B-FP8`	FP8	link
`Qwen3.5-264B-W4A16`	W4A16	link
`Qwen3.5-28B`	BF16	link
`Qwen3.5-35B-EXL3-4bpw` (this)	EXL3-4bpw	link
`Qwen3.5-76B`	BF16	link
`Qwen3.5-76B-GGUF`	GGUF	link
`Qwen3.5-88B`	BF16	link
`Qwen3.5-99B`	BF16	link
`Qwen3.5-99B-GGUF`	GGUF	link

The full base-model documentation lives upstream; this card covers only the EXL3-4bpw build.

See the base model for architecture, benchmarks, and general usage.

License & citation

License inherited from the base model.

@misc{lasby2025reap,
 title = {REAP the Experts: Why Pruning Prevails for One-Shot MoE Compression},
 author = {Mike Lasby and Ivan Lazarevich and Nish Sinnadurai and Sean Lie and Yani Ioannou and Vithursan Thangarasa},
 year = {2025}, eprint = {2510.13999}, archivePrefix = {arXiv}
}