VOOZH about

URL: https://huggingface.co/arsovskidev/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled

⇱ arsovskidev/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled Β· Hugging Face


Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled

Fine-tune of Gemma 4 E4B trained on Claude 4.6 Opus reasoning traces. The goal: take a compact 4B model and teach it to actually think before answering.

πŸ’‘ What this is

Standard Gemma 4 E4B is already solid. This fine-tune pushes it toward a more deliberate, structured reasoning style by training on ~2.3k high-quality Chain-of-Thought samples distilled from Claude 4.6 Opus.

The model learns to plan inside <think> tags before committing to a final answer β€” fewer impulsive responses, more structured breakdowns.

<think>
1. What is actually being asked here?
2. What are the constraints and edge cases?
3. Step-by-step plan...
4. Verify the logic holds.
</think>

Final answer here.

πŸ—ΊοΈ Pipeline

google/gemma-4-E4B-it
 β”‚
 β–Ό
SFT + QLoRA 4-bit (Unsloth)
 β”‚ loss masked to responses only
 β–Ό
Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled
 β”‚
 β–Ό
exported as GGUF (Q4_K_M + Q8_0)

βš™οΈ Training Details

Parameter Value
Base model google/gemma-4-E4B-it
Framework Unsloth
Method SFT + QLoRA (4-bit)
Dataset nohurry/Opus-4.6-Reasoning-3000x-filtered
Hardware RTX 5060 Ti 16GB
LoRA rank / alpha 16 / 16
Epochs 3
Max seq length 2048
Optimizer adamw_8bit
Learning rate 2e-4
LR scheduler cosine
Loss masking train_on_responses_only

πŸ“š Dataset

Dataset Description
nohurry/Opus-4.6-Reasoning-3000x-filtered ~2.3k filtered Claude 4.6 Opus reasoning trajectories covering math, logic, and coding

πŸš€ Run it

Ollama:

ollama run hf.co/arsovskidev/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled

llama.cpp:

./llama-cli -hf arsovskidev/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled \
 --temp 1.0 --top-p 0.95 --top-k 64

βœ… Good at

  • Multi-step math and logic problems
  • Code problem decomposition and debugging
  • Tasks where showing reasoning is more valuable than raw speed
  • Structured analysis of complex prompts

⚠️ Limitations

  • Text only β€” multimodal capabilities of the base model are not trained here
  • Small dataset β€” treat this as a focused reasoning fine-tune, not a general-purpose upgrade
  • Still an LLM β€” hallucinations happen, especially on factual recall outside the training domain

πŸ“œ License

Apache 2.0 + Gemma Terms of Use.

"Claude" is a trademark of Anthropic. This project is not affiliated with or endorsed by Anthropic β€” the name refers to the reasoning distillation data source only.

πŸ™ Acknowledgements

Unsloth for making this feasible on consumer hardware, and nohurry for the dataset.

πŸ“– Citation

@misc{arsovskidev_gemma4_opus_distilled,
 title = {Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled},
 author = {arsovskidev},
 year = {2026},
 publisher = {Hugging Face},
 howpublished = {\url{https://huggingface.co/arsovskidev/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled}}
}
Downloads last month
3,381
Safetensors
Model size
8B params
Tensor type
BF16
Β·

Model tree for arsovskidev/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled

Quantized
(242)
this model
Finetunes
2 models
Quantizations
3 models

Dataset used to train arsovskidev/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled