YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Mellum2-Thinker.Uncensored-12B-A2.5B-GGUF
Repository: WithinUsAI/Mellum2-Thinker.Uncensored-12B-A2.5B-GGUF
Overview
Mellum2-Thinker.Uncensored-12B-A2.5B-GGUF is an uncensored community derivative of JetBrains Mellum2-12B-A2.5B-Thinking converted to GGUF format for efficient local inference.
This release preserves the original reasoning-oriented behavior of Mellum2 Thinking while reducing alignment restrictions and refusals wherever possible. The model is intended for research, experimentation, creative writing, roleplay, agentic workflows, coding, reasoning, and unrestricted local AI deployments.
Like the original Mellum2 Thinking model, the model produces reasoning traces within <think>...</think> blocks before generating a final answer. (Hugging Face)
Highlights
- 🧠 Explicit reasoning with
<think>traces - ⚡ MoE architecture with only ~2.5B active parameters per token
- 📚 131K context length
- 💻 Strong coding and software engineering capabilities
- 🤖 Agent-friendly reasoning and planning
- 🔓 Reduced alignment restrictions compared to the original release
- 🦙 GGUF format for llama.cpp, KoboldCpp, LM Studio, Jan, Open WebUI, and Ollama-compatible ecosystems
- 🏠 Designed for local and offline deployments
Model Architecture
Mellum2-Thinker.Uncensored inherits the architecture of the original Mellum2 Thinking model:
| Attribute | Value |
|---|---|
| Architecture | Mixture-of-Experts (MoE) |
| Total Parameters | 12B |
| Active Parameters | 2.5B |
| Experts | 64 |
| Active Experts per Token | 8 |
| Layers | 28 |
| Hidden Size | 2304 |
| Context Length | 131,072 |
| Attention | Sliding Window + Full Attention |
| Vocabulary Size | 98,304 |
| Precision | BF16 Source |
| Format | GGUF |
Intended Use
Mellum2-Thinker.Uncensored is best suited for:
- Advanced reasoning
- Multi-step problem solving
- Agent frameworks
- Coding assistance
- Software engineering workflows
- Autonomous task planning
- Creative writing
- Storytelling
- Worldbuilding
- Roleplay
- Research
- Knowledge exploration
The model is particularly effective when explicit reasoning and chain-of-thought style outputs are desired.
Prompt Format
Chat Format
<|im_start|>system
You are a helpful assistant.
<|im_end|>
<|im_start|>user
Explain recursion.
<|im_end|>
<|im_start|>assistant
Thinking Example
User: Solve this problem.
Assistant:
<think>
Step-by-step reasoning...
</think>
Final answer...
Quantization Information
This repository contains GGUF quantizations for local inference.
Typical recommendations:
| Quant | Recommended RAM/VRAM |
|---|---|
| Q4_K_M | 8-10 GB |
Actual memory requirements vary by context length and backend.
Performance Characteristics
Mellum2 was designed as a high-efficiency focal reasoning model where only 2.5B parameters are activated per token despite containing 12B total parameters. This allows significantly faster inference than similarly sized dense models while retaining strong reasoning and coding capabilities. (arXiv)
Differences From The Original Release
This repository is not an official JetBrains release.
Changes include:
- Conversion to GGUF format
- Community packaging for local inference
- Reduced refusal behavior
- Reduced alignment constraints
- Intended for unrestricted research and experimentation
- Preservation of reasoning-focused behavior
No affiliation with JetBrains is implied.
License
This derivative is based on Mellum2, which was released under the Apache 2.0 License. Please review the original license and ensure compliance with all applicable terms.
Original Model:
JetBrains/Mellum2-12B-A2.5B-Thinking
Original Technical Report:
Acknowledgements
Special thanks to JetBrains for releasing Mellum2 as an open-weight model and making advanced reasoning-focused MoE architectures available to the open-source AI community. (Hugging Face)
Maintained by: WithinUsAI
