Qwen3.6-27B Hybrid-Optimized Quantization for 16 GB of VRAM

At ggufbench.com, we are always looking to advance local AI on average consumer hardware. We would love it if you took a minute to submit your performance results and launch arguments for this model, and others, on our website!

Quick Specs

File Size: 12.576 GiB
Avg Bits/Weight: 4.01 bpw
Target VRAM: 16 GB GPUs

Architecture-Aware Quant Strategy

Qwen3.6-27B is a hybrid Mamba/Transformer model. Not all layers serve the same purpose, and not all tensors tolerate quantization equally. This layout respects the architecture by:

Protecting pure-attention layers (blk.3,7,11...63) with higher precision for global reasoning and long-range focus.
Compressing SSM-dominated hybrid layers aggressively where the recurrent state carries the sequential load.
Preserving critical routing & projection tensors at native or near-native precision to prevent error compounding.
Downgrading resilient tensors (embeddings, FFN gate/up) where KLD sensitivity is flat and quality loss is imperceptible.

Benchmark Summary (WikiText-2, 580 chunks)

Metric	This	sokann (4.256 bpw)	bartowski Q3_K_M	mradermacher i1.IQ4_XS	bartowski IQ4_XS
Size (BPW)	4.01	4.256	4.270	4.483	4.556
Size (GiB)	12.576	13.327	13.370	14.036	14.266
Mean PPL(Q)	7.128552 ± 0.046785	7.098696 ± 0.047344	6.993009 ± 0.046208	7.020660 ± 0.046587	6.996323 ± 0.046332
Mean PPL(base)	6.900925 ± 0.045382	6.908506 ± 0.045543	6.908506 ± 0.045543	6.908506 ± 0.045543	6.908506 ± 0.045543
Cor(ln(PPL(Q)), ln(PPL(base)))	98.76%	99.19%	98.52%	99.30%	99.32%
Mean KLD	0.049767 ± 0.000745	0.033452 ± 0.000723	0.058818 ± 0.000881	0.046348 ± 0.000841	0.026270 ± 0.000653
Maximum KLD	22.599598	23.255085	24.616274	24.175169	22.992002
99.9% KLD	3.240748	2.907350	3.986622	3.614290	2.385293
RMS Δp	6.167 ± 0.054 %	4.936 ± 0.054 %	6.690 ± 0.059 %	5.867 ± 0.060 %	4.352 ± 0.057 %
Same top p	91.146 ± 0.074 %	92.427 ± 0.069 %	90.350 ± 0.077 %	93.903 ± 0.062 %	93.888 ± 0.062 %

Efficiency-First Parity: Achieves competitive quality at ~6% smaller size — PPL(Q) within 0.4% of sokann (4.256 bpw) and on par with bartowski Q3_K_M on KLD, all while saving ~750 MB of VRAM for larger context or higher batch sizes.

Acknowledgments

Special thanks to unsloth for their 9 TB of Qwen3.5 GGUF Benchmarks, which were instrumental in selecting the optimal quantization strategy for this model.
Thanks to bartowski for providing the calibration data used in this process.

Downloads last month: 3,222

GGUF

Model size

27B params

Architecture

qwen35

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ggufbench/Qwen3.6-27B-4bpw-16GB-VRAM

Base model

Qwen/Qwen3.6-27B

Quantized

(535)

this model