VOOZH about

URL: https://huggingface.co/RedHatAI/Qwen3-8B-speculator.dflash

⇱ RedHatAI/Qwen3-8B-speculator.dflash · Hugging Face


RedHatAI/Qwen3-8B-speculator.dflash

This is a DFlash speculator model for Qwen/Qwen3-8B.

Training Details

This model was trained using the Speculators library on a subset of Magpie-Align/Magpie-Llama-3.1-Pro-300K-Filtered and the train_sft split of HuggingFaceH4/ultrachat_200k. Responses were regenerated by Qwen3-8B (with reasoning). Training compute for this model was sponsored by Modal.

Model Specifications

Base Model Qwen/Qwen3-8B
Chat Template Qwen/Qwen3-8B (use /chat/completions endpoint)
Format Safetensors
License Apache 2.0
Validation Hardware Nvidia H100

Deployment

# Install vLLM from the required PR
pip install git+https://github.com/vllm-project/vllm.git@refs/pull/41880/head 
 
# Deploy with speculative decoding 
vllm serve Qwen/Qwen3-8B \ 
 --tensor-parallel-size 1 \ 
 --max-model-len 16384 \ 
 --speculative-config '{ 
 "model": "RedHatAI/Qwen3-8B-speculator.dflash", 
 "num_speculative_tokens": 7, 
 "method": "dflash" 
 }'

Preliminary Evaluations

Per-position token acceptance rates across datasets:
(with reasoning enabled)

Dataset Pos 1 Pos 2 Pos 3 Pos 4 Pos 5 Pos 6 Pos 7 Avg Length
HumanEval 79.9% 58.0% 40.3% 27.0% 17.8% 11.3% 6.8% 3.410
math_reasoning 82.2% 62.7% 46.2% 33.5% 23.4% 15.8% 9.9% 3.740
qa 68.9% 42.6% 25.0% 14.4% 8.1% 4.4% 2.3% 2.660
question 73.0% 47.6% 30.1% 18.9% 11.7% 7.1% 4.1% 2.930
rag 71.1% 44.8% 27.0% 15.7% 8.9% 4.9% 2.5% 2.750
summarization 65.5% 36.1% 19.0% 9.5% 4.7% 2.3% 1.1% 2.380
tool_call 71.3% 44.6% 25.8% 14.4% 7.8% 4.1% 2.1% 2.700
translation 63.8% 38.4% 22.1% 11.8% 6.1% 3.2% 1.5% 2.470
writing 73.2% 47.7% 30.1% 18.9% 11.8% 7.2% 4.2% 2.930

References

Paper: DFlash: Block Diffusion for Flash Speculative Decoding

Downloads last month
919
Safetensors
Model size
2B params
Tensor type
I64
·
BF16
·
BOOL
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 2 Ask for provider support

Collection including RedHatAI/Qwen3-8B-speculator.dflash

Paper for RedHatAI/Qwen3-8B-speculator.dflash