VOOZH about

URL: https://huggingface.co/WithinUsAI/Mellum2-Thinker.Uncensored-12B-A2.5B-gguf

⇱ WithinUsAI/Mellum2-Thinker.Uncensored-12B-A2.5B-gguf · Hugging Face


YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Mellum2-Thinker.Uncensored-12B-A2.5B-GGUF

Repository: WithinUsAI/Mellum2-Thinker.Uncensored-12B-A2.5B-GGUF

Overview

Mellum2-Thinker.Uncensored-12B-A2.5B-GGUF is an uncensored community derivative of JetBrains Mellum2-12B-A2.5B-Thinking converted to GGUF format for efficient local inference.

This release preserves the original reasoning-oriented behavior of Mellum2 Thinking while reducing alignment restrictions and refusals wherever possible. The model is intended for research, experimentation, creative writing, roleplay, agentic workflows, coding, reasoning, and unrestricted local AI deployments.

Like the original Mellum2 Thinking model, the model produces reasoning traces within <think>...</think> blocks before generating a final answer. (Hugging Face)


Highlights

  • 🧠 Explicit reasoning with <think> traces
  • ⚡ MoE architecture with only ~2.5B active parameters per token
  • 📚 131K context length
  • 💻 Strong coding and software engineering capabilities
  • 🤖 Agent-friendly reasoning and planning
  • 🔓 Reduced alignment restrictions compared to the original release
  • 🦙 GGUF format for llama.cpp, KoboldCpp, LM Studio, Jan, Open WebUI, and Ollama-compatible ecosystems
  • 🏠 Designed for local and offline deployments

Model Architecture

Mellum2-Thinker.Uncensored inherits the architecture of the original Mellum2 Thinking model:

Attribute Value
Architecture Mixture-of-Experts (MoE)
Total Parameters 12B
Active Parameters 2.5B
Experts 64
Active Experts per Token 8
Layers 28
Hidden Size 2304
Context Length 131,072
Attention Sliding Window + Full Attention
Vocabulary Size 98,304
Precision BF16 Source
Format GGUF

(Hugging Face)


Intended Use

Mellum2-Thinker.Uncensored is best suited for:

  • Advanced reasoning
  • Multi-step problem solving
  • Agent frameworks
  • Coding assistance
  • Software engineering workflows
  • Autonomous task planning
  • Creative writing
  • Storytelling
  • Worldbuilding
  • Roleplay
  • Research
  • Knowledge exploration

The model is particularly effective when explicit reasoning and chain-of-thought style outputs are desired.


Prompt Format

Chat Format

<|im_start|>system
You are a helpful assistant.
<|im_end|>

<|im_start|>user
Explain recursion.
<|im_end|>

<|im_start|>assistant

Thinking Example

User: Solve this problem.

Assistant:
<think>
Step-by-step reasoning...
</think>

Final answer...

Quantization Information

This repository contains GGUF quantizations for local inference.

Typical recommendations:

Quant Recommended RAM/VRAM
Q4_K_M 8-10 GB

Actual memory requirements vary by context length and backend.


Performance Characteristics

Mellum2 was designed as a high-efficiency focal reasoning model where only 2.5B parameters are activated per token despite containing 12B total parameters. This allows significantly faster inference than similarly sized dense models while retaining strong reasoning and coding capabilities. (arXiv)


Differences From The Original Release

This repository is not an official JetBrains release.

Changes include:

  • Conversion to GGUF format
  • Community packaging for local inference
  • Reduced refusal behavior
  • Reduced alignment constraints
  • Intended for unrestricted research and experimentation
  • Preservation of reasoning-focused behavior

No affiliation with JetBrains is implied.


License

This derivative is based on Mellum2, which was released under the Apache 2.0 License. Please review the original license and ensure compliance with all applicable terms.

Original Model:

JetBrains/Mellum2-12B-A2.5B-Thinking

Original Technical Report:

Mellum2 Technical Report


Acknowledgements

Special thanks to JetBrains for releasing Mellum2 as an open-weight model and making advanced reasoning-focused MoE architectures available to the open-source AI community. (Hugging Face)

Maintained by: WithinUsAI

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including WithinUsAI/Mellum2-Thinker.Uncensored-12B-A2.5B-gguf

Paper for WithinUsAI/Mellum2-Thinker.Uncensored-12B-A2.5B-gguf