21 items • Updated • 24
RedHatAI/Qwen3-8B-speculator.dflash
This is a DFlash speculator model for Qwen/Qwen3-8B.
Training Details
This model was trained using the Speculators library on a subset of Magpie-Align/Magpie-Llama-3.1-Pro-300K-Filtered and the train_sft split of HuggingFaceH4/ultrachat_200k. Responses were regenerated by Qwen3-8B (with reasoning). Training compute for this model was sponsored by Modal.
Model Specifications
| Base Model | Qwen/Qwen3-8B |
| Chat Template | Qwen/Qwen3-8B (use /chat/completions endpoint) |
| Format | Safetensors |
| License | Apache 2.0 |
| Validation Hardware | Nvidia H100 |
Deployment
# Install vLLM from the required PR
pip install git+https://github.com/vllm-project/vllm.git@refs/pull/41880/head
# Deploy with speculative decoding
vllm serve Qwen/Qwen3-8B \
--tensor-parallel-size 1 \
--max-model-len 16384 \
--speculative-config '{
"model": "RedHatAI/Qwen3-8B-speculator.dflash",
"num_speculative_tokens": 7,
"method": "dflash"
}'
Preliminary Evaluations
Per-position token acceptance rates across datasets:
(with reasoning enabled)
| Dataset | Pos 1 | Pos 2 | Pos 3 | Pos 4 | Pos 5 | Pos 6 | Pos 7 | Avg Length |
|---|---|---|---|---|---|---|---|---|
| HumanEval | 79.9% | 58.0% | 40.3% | 27.0% | 17.8% | 11.3% | 6.8% | 3.410 |
| math_reasoning | 82.2% | 62.7% | 46.2% | 33.5% | 23.4% | 15.8% | 9.9% | 3.740 |
| qa | 68.9% | 42.6% | 25.0% | 14.4% | 8.1% | 4.4% | 2.3% | 2.660 |
| question | 73.0% | 47.6% | 30.1% | 18.9% | 11.7% | 7.1% | 4.1% | 2.930 |
| rag | 71.1% | 44.8% | 27.0% | 15.7% | 8.9% | 4.9% | 2.5% | 2.750 |
| summarization | 65.5% | 36.1% | 19.0% | 9.5% | 4.7% | 2.3% | 1.1% | 2.380 |
| tool_call | 71.3% | 44.6% | 25.8% | 14.4% | 7.8% | 4.1% | 2.1% | 2.700 |
| translation | 63.8% | 38.4% | 22.1% | 11.8% | 6.1% | 3.2% | 1.5% | 2.470 |
| writing | 73.2% | 47.7% | 30.1% | 18.9% | 11.8% | 7.2% | 4.2% | 2.930 |
References
Paper: DFlash: Block Diffusion for Flash Speculative Decoding
- Downloads last month
- 919
Safetensors
Model size
2B params
Tensor type
I64
·
BF16 ·
BOOL ·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 2 Ask for provider support
