VOOZH about

URL: https://huggingface.co/RedHatAI/Qwen3-8B-speculator.peagle

⇱ RedHatAI/Qwen3-8B-speculator.peagle · Hugging Face


RedHatAI/Qwen3-8B-speculator.peagle

This is a DFlash speculator model for Qwen/Qwen3-8B.

Training Details

This model was trained using the Speculators library on a subset of Magpie-Align/Magpie-Llama-3.1-Pro-300K-Filtered and the train_sft split of HuggingFaceH4/ultrachat_200k. Responses were regenerated by Qwen3-8B (with reasoning).

Model Specifications

Base Model Qwen/Qwen3-8B
Chat Template Qwen/Qwen3-8B (use /chat/completions endpoint)
Format Safetensors
License Apache 2.0
Validation Hardware Nvidia H100

Deployment

# Install vLLM 
 
# Deploy with speculative decoding 
vllm serve RedHatAI/Qwen3-8B-speculator.peagle

Preliminary Evaluations

Per-position token acceptance rates across datasets:
(with reasoning enabled)

Dataset Pos 1 Pos 2 Pos 3 Pos 4 Pos 5 Pos 6 Pos 7 Avg Length
HumanEval 81.3% 59.0% 41.1% 27.9% 18.8% 12.8% 8.9% 3.500
math_reasoning 83.3% 63.5% 47.0% 34.3% 24.4% 17.2% 11.8% 3.820
qa 70.5% 44.7% 27.6% 17.1% 10.8% 7.1% 4.8% 2.830
question 74.6% 49.6% 31.6% 20.2% 13.1% 8.5% 5.6% 3.030
rag 73.6% 48.4% 29.8% 18.4% 11.3% 6.9% 4.1% 2.930
summarization 68.0% 39.0% 21.0% 10.8% 5.4% 2.6% 1.2% 2.480
tool_call 73.7% 47.6% 28.7% 17.1% 10.3% 6.2% 3.7% 2.870
translation 73.8% 47.7% 28.7% 17.3% 10.4% 6.5% 4.1% 2.890
writing 75.0% 50.0% 32.1% 20.6% 13.3% 8.7% 5.7% 3.050

References

Paper: P-EAGLE: Parallel-Drafting EAGLE with Scalable Training

Downloads last month
38
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RedHatAI/Qwen3-8B-speculator.peagle

Finetuned
Qwen/Qwen3-8B
Finetuned
(1781)
this model

Collection including RedHatAI/Qwen3-8B-speculator.peagle

Paper for RedHatAI/Qwen3-8B-speculator.peagle