VOOZH about

URL: https://huggingface.co/ggufbench/Qwen3.6-27B-4bpw-16GB-VRAM

⇱ ggufbench/Qwen3.6-27B-4bpw-16GB-VRAM · Hugging Face


Qwen3.6-27B Hybrid-Optimized Quantization for 16 GB of VRAM

At ggufbench.com, we are always looking to advance local AI on average consumer hardware. We would love it if you took a minute to submit your performance results and launch arguments for this model, and others, on our website!

Quick Specs

  • File Size: 12.576 GiB
  • Avg Bits/Weight: 4.01 bpw
  • Target VRAM: 16 GB GPUs

Architecture-Aware Quant Strategy

Qwen3.6-27B is a hybrid Mamba/Transformer model. Not all layers serve the same purpose, and not all tensors tolerate quantization equally. This layout respects the architecture by:

  1. Protecting pure-attention layers (blk.3,7,11...63) with higher precision for global reasoning and long-range focus.
  2. Compressing SSM-dominated hybrid layers aggressively where the recurrent state carries the sequential load.
  3. Preserving critical routing & projection tensors at native or near-native precision to prevent error compounding.
  4. Downgrading resilient tensors (embeddings, FFN gate/up) where KLD sensitivity is flat and quality loss is imperceptible.

Benchmark Summary (WikiText-2, 580 chunks)

Metric This sokann (4.256 bpw) bartowski Q3_K_M mradermacher i1.IQ4_XS bartowski IQ4_XS
Size (BPW) 4.01 4.256 4.270 4.483 4.556
Size (GiB) 12.576 13.327 13.370 14.036 14.266
Mean PPL(Q) 7.128552 ± 0.046785 7.098696 ± 0.047344 6.993009 ± 0.046208 7.020660 ± 0.046587 6.996323 ± 0.046332
Mean PPL(base) 6.900925 ± 0.045382 6.908506 ± 0.045543 6.908506 ± 0.045543 6.908506 ± 0.045543 6.908506 ± 0.045543
Cor(ln(PPL(Q)), ln(PPL(base))) 98.76% 99.19% 98.52% 99.30% 99.32%
Mean KLD 0.049767 ± 0.000745 0.033452 ± 0.000723 0.058818 ± 0.000881 0.046348 ± 0.000841 0.026270 ± 0.000653
Maximum KLD 22.599598 23.255085 24.616274 24.175169 22.992002
99.9% KLD 3.240748 2.907350 3.986622 3.614290 2.385293
RMS Δp 6.167 ± 0.054 % 4.936 ± 0.054 % 6.690 ± 0.059 % 5.867 ± 0.060 % 4.352 ± 0.057 %
Same top p 91.146 ± 0.074 % 92.427 ± 0.069 % 90.350 ± 0.077 % 93.903 ± 0.062 % 93.888 ± 0.062 %
  • Efficiency-First Parity: Achieves competitive quality at ~6% smaller size — PPL(Q) within 0.4% of sokann (4.256 bpw) and on par with bartowski Q3_K_M on KLD, all while saving ~750 MB of VRAM for larger context or higher batch sizes.

Acknowledgments

  • Special thanks to unsloth for their 9 TB of Qwen3.5 GGUF Benchmarks, which were instrumental in selecting the optimal quantization strategy for this model.
  • Thanks to bartowski for providing the calibration data used in this process.
Downloads last month
3,222
GGUF
Model size
27B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ggufbench/Qwen3.6-27B-4bpw-16GB-VRAM

Base model

Qwen/Qwen3.6-27B
Quantized
(535)
this model