Zhi-Create-Qwen3-32B-Eagle3

This is a speculator model designed for use with Zhihu-ai/Zhi-Create-Qwen3-32B, based on the EAGLE-3 speculative decoding algorithm. It was trained using the SpecForge library on a subset of the Supervised Fine-tuning (SFT) Data from Zhihu-ai/Zhi-Create-Qwen3-32B.
The model was trained in both thinking and non-thinking modes.

You can easily start a service using SGLang.

pip install "sglang[all]>=0.4.9"

python3 -m sglang.launch_server --model Zhihu-ai/Zhi-Create-Qwen3-32B --speculative-algorithm EAGLE3 --speculative-draft-model-path Zhihu-ai/Zhi-Create-Qwen3-32B-Eagle3 --speculative-num-steps 3 --speculative-eagle-topk 2 --speculative-num-draft-tokens 8 --tp 2 --port 8000 --dtype bfloat16 --reasoning-parser deepseek-r1 --served-model-name Zhi-Create-Qwen3-32B

# send request
curl http://localhost:8000/v1/completions \
 -H "Content-Type: application/json" \
 -d '{
 "model": "Zhi-Create-Qwen3-32B",
 "prompt": "请你以鲁迅的口吻，写一篇介绍西湖醋鱼的文章",
 "max_tokens": 4096,
 "temperature": 0.6,
 "top_p": 0.95
 }'

# Alternative: Using OpenAI API
from openai import OpenAI
openai_api_key = "empty"
openai_api_base = "http://127.0.0.1:8000/v1"

client = OpenAI(
 api_key=openai_api_key,
 base_url=openai_api_base
)

def get_answer(messages):
 response = client.chat.completions.create(
 messages=messages,
 model="Zhi-Create-Qwen3-32B",
 max_tokens=4096,
 temperature=0.3,
 top_p=0.95,
 stream=True,
 extra_body = {"chat_template_kwargs": {"enable_thinking": True}}
 )
 answer = ""
 reasoning_content_all = ""
 for each in response:
 each_content = each.choices[0].delta.content
 if hasattr(each.choices[0].delta, "content"):
 each_content = each.choices[0].delta.content
 else:
 each_content = None
 if hasattr(each.choices[0].delta, "reasoning_content"):
 reasoning_content = each.choices[0].delta.reasoning_content
 else:
 reasoning_content = None
 if each_content is not None:
 answer += each_content
 print(each_content, end="", flush=True)
 if reasoning_content is not None:
 reasoning_content_all += reasoning_content
 print(reasoning_content, end="", flush=True)
 return answer, reasoning_content_all

prompt = "请你以鲁迅的口吻，写一篇介绍西湖醋鱼的文章"
messages = [
 {"role": "user", "content": prompt}
]

answer, reasoning_content_all = get_answer(messages)

Downloads last month: 20

Safetensors

Model size

2B params

Tensor type

I64

BF16

BOOL

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Zhihu-ai/Zhi-Create-Qwen3-32B-Eagle3

Base model

Qwen/Qwen3-32B

Finetuned

Zhihu-ai/Zhi-Create-Qwen3-32B

Finetuned

(1)

this model

Datasets used to train Zhihu-ai/Zhi-Create-Qwen3-32B-Eagle3

Paper for Zhihu-ai/Zhi-Create-Qwen3-32B-Eagle3

Paper • 2503.01840 • Published Mar 3, 2025 • 10

URL: https://huggingface.co/Zhihu-ai/Zhi-Create-Qwen3-32B-Eagle3

⇱ Zhihu-ai/Zhi-Create-Qwen3-32B-Eagle3 · Hugging Face

Zhi-Create-Qwen3-32B-Eagle3

Model tree for Zhihu-ai/Zhi-Create-Qwen3-32B-Eagle3

Datasets used to train Zhihu-ai/Zhi-Create-Qwen3-32B-Eagle3

Paper for Zhihu-ai/Zhi-Create-Qwen3-32B-Eagle3