Support this work → · X · GitHub · REAP paper · Cerebras REAP

Qwen3.5-76B

At a glance


Base model	Qwen/Qwen3.5-122B-A10B
Format	BF16
Total params	76B
Active / token	10B
Experts / layer	—
Layers	—
Hidden size	—
Context	—
On-disk size	152 GB

Which variant should I pick?

Variant	Format	Link
`Qwen3.5-264B`	BF16	link
`Qwen3.5-264B-FP8`	FP8	link
`Qwen3.5-264B-W4A16`	W4A16	link
`Qwen3.5-28B`	BF16	link
`Qwen3.5-35B-EXL3-4bpw`	EXL3-4bpw	link
`Qwen3.5-76B` (this)	BF16	link
`Qwen3.5-76B-GGUF`	GGUF	link
`Qwen3.5-88B`	BF16	link
`Qwen3.5-99B`	BF16	link
`Qwen3.5-99B-GGUF`	GGUF	link

40% expert-pruned variant of Qwen3.5-122B-A10B using REAP (Routing-Enhanced Activation Pruning).

Model Details

Property	Value
Base Model	Qwen/Qwen3.5-122B-A10B
Architecture	Qwen3.5 MoE (GDN + Full Attention)
Original Experts	256 per layer
Pruned Experts	154 per layer (40% removed)
Active Parameters	~10B per token
Pruning Method	REAP with targeted refusal preservation
Preserve Threshold	80% (super-expert protection)
Calibration	reap-calibration-data-v1 — 23k benchmark-free samples
Maintainer	0xSero
Organization	Sybil Solutions
Project	REAP PR17

Usage

vllm serve 0xSero/Qwen3.5-76B \
 --tensor-parallel-size 4 \
 --enable-expert-parallel \
 --max-model-len 8192 \
 --trust-remote-code \
 --language-model-only \
 --dtype bfloat16

Important: Use --language-model-only flag — this is a text-only checkpoint pruned from the multimodal base model.

What is REAP?

REAP (Routing-Enhanced Activation Pruning) removes the least-activated experts from MoE models while preserving critical capabilities. It uses router activation patterns from a calibration dataset to identify dispensable experts, with special protection for safety-critical behaviors.

License

Same license as the base model (Qwen).

License & citation

License inherited from the base model.

@misc{lasby2025reap,
 title = {REAP the Experts: Why Pruning Prevails for One-Shot MoE Compression},
 author = {Mike Lasby and Ivan Lazarevich and Nish Sinnadurai and Sean Lie and Yani Ioannou and Vithursan Thangarasa},
 year = {2025}, eprint = {2510.13999}, archivePrefix = {arXiv}
}

Model tree for 0xSero/Qwen3.5-76B

Base model

Qwen/Qwen3.5-122B-A10B

Finetuned

(34)

this model

Space using 0xSero/Qwen3.5-76B 1

Collection including 0xSero/Qwen3.5-76B

REAP-pruned & quantized Qwen3.5 / 3.6 / Coder variants. • 15 items • Updated 19 days ago

Paper for 0xSero/Qwen3.5-76B

Paper • 2510.13999 • Published Oct 15, 2025 • 20

URL: https://huggingface.co/0xSero/Qwen3.5-76B

⇱ 0xSero/Qwen3.5-76B · Hugging Face