NVFP4 Quantized RedHatAI/Qwen3.6-35B-A3B-NVFP4

This is a preliminary version (and subject to change) of NVFP4 quantized Qwen/Qwen3.6-35B-A3B model. The model has both weights and activations quantized to NVFP4 format with vllm-project/llm-compressor.

It is compatible and tested against vllm main. Deploy it with: vllm serve RedHatAI/Qwen3.6-35B-A3B-NVFP4 --reasoning-parser qwen3 --moe_backend flashinfer_cutlass

Creation Script:

Run this script with LLM Compressor main and latest transformers.

Preliminary Evaluations

GSM8K Platinum:

lm_eval --model local-chat-completions \
 --tasks gsm8k_platinum_cot_llama \
 --model_args "model=RedHatAI/Qwen3.6-35B-A3B-NVFP4,max_length=262144,base_url=http://0.0.0.0:8000/v1/chat/completions,num_concurrent=128,max_retries=3,tokenized_requests=False,tokenizer_backend=None,timeout=1200" \
 --num_fewshot 0 \
 --apply_chat_template \
 --gen_kwargs "do_sample=True,temperature=1.0,top_p=0.95,top_k=20,min_p=0.0,max_gen_toks=64000,presence_penalty=1.5,repetition_penalty=1.0,seed=5678"

Recovery:

	Qwen/Qwen3.6-35B-A3B	RedHatAI/Qwen3.6-35B-A3B-NVFP4 (this model)
Accuracy	95.62	96.28
Recovery	-	100.69%

Note: More rigorous evaluations are currently in progress and will be available soon.

Downloads last month: 2,567,961

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RedHatAI/Qwen3.6-35B-A3B-NVFP4

Base model

Qwen/Qwen3.6-35B-A3B

Quantized

(492)

this model

Quantizations

1 model

URL: https://huggingface.co/RedHatAI/Qwen3.6-35B-A3B-NVFP4

⇱ RedHatAI/Qwen3.6-35B-A3B-NVFP4 · Hugging Face

NVFP4 Quantized RedHatAI/Qwen3.6-35B-A3B-NVFP4

Creation Script:

Preliminary Evaluations

Model tree for RedHatAI/Qwen3.6-35B-A3B-NVFP4