Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled

Fine-tune of Gemma 4 E4B trained on Claude 4.6 Opus reasoning traces. The goal: take a compact 4B model and teach it to actually think before answering.

💡 What this is

Standard Gemma 4 E4B is already solid. This fine-tune pushes it toward a more deliberate, structured reasoning style by training on ~2.3k high-quality Chain-of-Thought samples distilled from Claude 4.6 Opus.

The model learns to plan inside <think> tags before committing to a final answer — fewer impulsive responses, more structured breakdowns.

<think>
1. What is actually being asked here?
2. What are the constraints and edge cases?
3. Step-by-step plan...
4. Verify the logic holds.
</think>

Final answer here.

🗺️ Pipeline

google/gemma-4-E4B-it
 │
 ▼
SFT + QLoRA 4-bit (Unsloth)
 │ loss masked to responses only
 ▼
Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled
 │
 ▼
exported as GGUF (Q4_K_M + Q8_0)

⚙️ Training Details

Parameter	Value
Base model	google/gemma-4-E4B-it
Framework	Unsloth
Method	SFT + QLoRA (4-bit)
Dataset	nohurry/Opus-4.6-Reasoning-3000x-filtered
Hardware	RTX 5060 Ti 16GB
LoRA rank / alpha	16 / 16
Epochs	3
Max seq length	2048
Optimizer	adamw_8bit
Learning rate	2e-4
LR scheduler	cosine
Loss masking	train_on_responses_only

📚 Dataset

Dataset	Description
nohurry/Opus-4.6-Reasoning-3000x-filtered	~2.3k filtered Claude 4.6 Opus reasoning trajectories covering math, logic, and coding

🚀 Run it

Ollama:

ollama run hf.co/arsovskidev/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled

llama.cpp:

./llama-cli -hf arsovskidev/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled \
 --temp 1.0 --top-p 0.95 --top-k 64

✅ Good at

Multi-step math and logic problems
Code problem decomposition and debugging
Tasks where showing reasoning is more valuable than raw speed
Structured analysis of complex prompts

⚠️ Limitations

Text only — multimodal capabilities of the base model are not trained here
Small dataset — treat this as a focused reasoning fine-tune, not a general-purpose upgrade
Still an LLM — hallucinations happen, especially on factual recall outside the training domain

📜 License

Apache 2.0 + Gemma Terms of Use.

"Claude" is a trademark of Anthropic. This project is not affiliated with or endorsed by Anthropic — the name refers to the reasoning distillation data source only.

🙏 Acknowledgements

Unsloth for making this feasible on consumer hardware, and nohurry for the dataset.

📖 Citation

@misc{arsovskidev_gemma4_opus_distilled,
 title = {Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled},
 author = {arsovskidev},
 year = {2026},
 publisher = {Hugging Face},
 howpublished = {\url{https://huggingface.co/arsovskidev/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled}}
}

Downloads last month: 3,381

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for arsovskidev/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled

Base model

google/gemma-4-E4B

Finetuned

google/gemma-4-E4B-it

Quantized

(242)

this model

Finetunes

2 models

Quantizations

3 models

URL: https://huggingface.co/arsovskidev/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled

⇱ arsovskidev/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled · Hugging Face