RedHatAI/Qwen3-8B-speculator.dflash

This is a DFlash speculator model for Qwen/Qwen3-8B.

Training Details

This model was trained using the Speculators library on a subset of Magpie-Align/Magpie-Llama-3.1-Pro-300K-Filtered and the train_sft split of HuggingFaceH4/ultrachat_200k. Responses were regenerated by Qwen3-8B (with reasoning). Training compute for this model was sponsored by Modal.

Model Specifications


Base Model	Qwen/Qwen3-8B
Chat Template	Qwen/Qwen3-8B (use `/chat/completions` endpoint)
Format	Safetensors
License	Apache 2.0
Validation Hardware	Nvidia H100

Deployment

# Install vLLM from the required PR
pip install git+https://github.com/vllm-project/vllm.git@refs/pull/41880/head 
 
# Deploy with speculative decoding 
vllm serve Qwen/Qwen3-8B \ 
 --tensor-parallel-size 1 \ 
 --max-model-len 16384 \ 
 --speculative-config '{ 
 "model": "RedHatAI/Qwen3-8B-speculator.dflash", 
 "num_speculative_tokens": 7, 
 "method": "dflash" 
 }'

Preliminary Evaluations

Per-position token acceptance rates across datasets:
(with reasoning enabled)

Dataset	Pos 1	Pos 2	Pos 3	Pos 4	Pos 5	Pos 6	Pos 7	Avg Length
HumanEval	79.9%	58.0%	40.3%	27.0%	17.8%	11.3%	6.8%	3.410
math_reasoning	82.2%	62.7%	46.2%	33.5%	23.4%	15.8%	9.9%	3.740
qa	68.9%	42.6%	25.0%	14.4%	8.1%	4.4%	2.3%	2.660
question	73.0%	47.6%	30.1%	18.9%	11.7%	7.1%	4.1%	2.930
rag	71.1%	44.8%	27.0%	15.7%	8.9%	4.9%	2.5%	2.750
summarization	65.5%	36.1%	19.0%	9.5%	4.7%	2.3%	1.1%	2.380
tool_call	71.3%	44.6%	25.8%	14.4%	7.8%	4.1%	2.1%	2.700
translation	63.8%	38.4%	22.1%	11.8%	6.1%	3.2%	1.5%	2.470
writing	73.2%	47.7%	30.1%	18.9%	11.8%	7.2%	4.2%	2.930

References

Paper: DFlash: Block Diffusion for Flash Speculative Decoding

Downloads last month: 919

Safetensors

Model size

2B params

Tensor type

I64

BF16

BOOL

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 2 Ask for provider support

Collection including RedHatAI/Qwen3-8B-speculator.dflash

21 items • Updated 10 days ago • 24

Paper for RedHatAI/Qwen3-8B-speculator.dflash

Paper • 2602.06036 • Published Feb 5 • 88

URL: https://huggingface.co/RedHatAI/Qwen3-8B-speculator.dflash

⇱ RedHatAI/Qwen3-8B-speculator.dflash · Hugging Face