YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Mellum2-Thinker.Uncensored-12B-A2.5B-GGUF

Repository: WithinUsAI/Mellum2-Thinker.Uncensored-12B-A2.5B-GGUF

Overview

Mellum2-Thinker.Uncensored-12B-A2.5B-GGUF is an uncensored community derivative of JetBrains Mellum2-12B-A2.5B-Thinking converted to GGUF format for efficient local inference.

This release preserves the original reasoning-oriented behavior of Mellum2 Thinking while reducing alignment restrictions and refusals wherever possible. The model is intended for research, experimentation, creative writing, roleplay, agentic workflows, coding, reasoning, and unrestricted local AI deployments.

Like the original Mellum2 Thinking model, the model produces reasoning traces within <think>...</think> blocks before generating a final answer. (Hugging Face)

Highlights

🧠 Explicit reasoning with <think> traces
⚡ MoE architecture with only ~2.5B active parameters per token
📚 131K context length
💻 Strong coding and software engineering capabilities
🤖 Agent-friendly reasoning and planning
🔓 Reduced alignment restrictions compared to the original release
🦙 GGUF format for llama.cpp, KoboldCpp, LM Studio, Jan, Open WebUI, and Ollama-compatible ecosystems
🏠 Designed for local and offline deployments

Model Architecture

Mellum2-Thinker.Uncensored inherits the architecture of the original Mellum2 Thinking model:

Attribute	Value
Architecture	Mixture-of-Experts (MoE)
Total Parameters	12B
Active Parameters	2.5B
Experts	64
Active Experts per Token	8
Layers	28
Hidden Size	2304
Context Length	131,072
Attention	Sliding Window + Full Attention
Vocabulary Size	98,304
Precision	BF16 Source
Format	GGUF

(Hugging Face)

Intended Use

Mellum2-Thinker.Uncensored is best suited for:

Advanced reasoning
Multi-step problem solving
Agent frameworks
Coding assistance
Software engineering workflows
Autonomous task planning
Creative writing
Storytelling
Worldbuilding
Roleplay
Research
Knowledge exploration

The model is particularly effective when explicit reasoning and chain-of-thought style outputs are desired.

Prompt Format

Chat Format

<|im_start|>system
You are a helpful assistant.
<|im_end|>

<|im_start|>user
Explain recursion.
<|im_end|>

<|im_start|>assistant

Thinking Example

User: Solve this problem.

Assistant:
<think>
Step-by-step reasoning...
</think>

Final answer...

Quantization Information

This repository contains GGUF quantizations for local inference.

Typical recommendations:

Quant	Recommended RAM/VRAM
Q4_K_M	8-10 GB

Actual memory requirements vary by context length and backend.

Performance Characteristics

Mellum2 was designed as a high-efficiency focal reasoning model where only 2.5B parameters are activated per token despite containing 12B total parameters. This allows significantly faster inference than similarly sized dense models while retaining strong reasoning and coding capabilities. (arXiv)

Differences From The Original Release

This repository is not an official JetBrains release.

Changes include:

Conversion to GGUF format
Community packaging for local inference
Reduced refusal behavior
Reduced alignment constraints
Intended for unrestricted research and experimentation
Preservation of reasoning-focused behavior

No affiliation with JetBrains is implied.

License

This derivative is based on Mellum2, which was released under the Apache 2.0 License. Please review the original license and ensure compliance with all applicable terms.

Original Model:

JetBrains/Mellum2-12B-A2.5B-Thinking

Original Technical Report:

Mellum2 Technical Report

Acknowledgements

Special thanks to JetBrains for releasing Mellum2 as an open-weight model and making advanced reasoning-focused MoE architectures available to the open-source AI community. (Hugging Face)

Maintained by: WithinUsAI

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including WithinUsAI/Mellum2-Thinker.Uncensored-12B-A2.5B-gguf

LLM MODELS TRAINED, FINE-TUNED, MERGED and Refusal Removal BY (WITHIN US AI) • 24 items • Updated 12 days ago • 7

Paper for WithinUsAI/Mellum2-Thinker.Uncensored-12B-A2.5B-gguf

Paper • 2605.31268 • Published May 29 • 58

URL: https://huggingface.co/WithinUsAI/Mellum2-Thinker.Uncensored-12B-A2.5B-gguf

⇱ WithinUsAI/Mellum2-Thinker.Uncensored-12B-A2.5B-gguf · Hugging Face