This model is obtained similarly to how the RedHatAI/Qwen3.6-35B-A3B-NVFP4 was obtained with the following compression script using llm-compressor.
NOTE: Unlike the aforementioned model, the linear_attn layers have been quantized as well in this model to save memory for longer context lengths on RTX 5090 GPUs. Click the dropdown to see the full quantization script.
- Downloads last month
- 40,823
Safetensors
Model size
17B params
Tensor type
F32
·
BF16 ·
F8_E4M3 ·
U8 ·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for Peutlefaire/Qwen3.6-27B-NVFP4
Base model
Qwen/Qwen3.6-27B