VOOZH about

URL: https://huggingface.co/0xSero/Qwen3.5-35B-EXL3-4bpw

⇱ 0xSero/Qwen3.5-35B-EXL3-4bpw · Hugging Face


Support this work → · X · GitHub · REAP paper · Cerebras REAP

Qwen3.5-35B-EXL3-4bpw

EXL3-4bpw quantization of Qwen/Qwen3.5-35B-A3B-Base.

At a glance

Base model Qwen/Qwen3.5-35B-A3B-Base
Format EXL3-4bpw
Total params 35B
Active / token 3B
Experts / layer
Layers
Hidden size
Context
On-disk size 21 GB

Which variant should I pick?

Variant Format Link
Qwen3.5-264B BF16 link
Qwen3.5-264B-FP8 FP8 link
Qwen3.5-264B-W4A16 W4A16 link
Qwen3.5-28B BF16 link
Qwen3.5-35B-EXL3-4bpw (this) EXL3-4bpw link
Qwen3.5-76B BF16 link
Qwen3.5-76B-GGUF GGUF link
Qwen3.5-88B BF16 link
Qwen3.5-99B BF16 link
Qwen3.5-99B-GGUF GGUF link

The full base-model documentation lives upstream; this card covers only the EXL3-4bpw build.

See the base model for architecture, benchmarks, and general usage.

License & citation

License inherited from the base model.

@misc{lasby2025reap,
 title = {REAP the Experts: Why Pruning Prevails for One-Shot MoE Compression},
 author = {Mike Lasby and Ivan Lazarevich and Nish Sinnadurai and Sean Lie and Yani Ioannou and Vithursan Thangarasa},
 year = {2025}, eprint = {2510.13999}, archivePrefix = {arXiv}
}

Sponsors

Made possible by NVIDIA · TNG Technology · Lambda · Prime Intellect · Hot Aisle.

Downloads last month
22
Safetensors
Model size
11B params
Tensor type
F32
·
F16
·
I16
·
BF16
·

Model tree for 0xSero/Qwen3.5-35B-EXL3-4bpw

Quantized
(11)
this model

Collection including 0xSero/Qwen3.5-35B-EXL3-4bpw

Paper for 0xSero/Qwen3.5-35B-EXL3-4bpw