👁 Mellum

Mellum2 Base

Use this checkpoint as the starting point for your own fine-tuning, alignment, or domain adaptation on top of the long-context base. For instruction-following or reasoning tasks out of the box, use Instruct or Thinking instead.

Mellum2 Base Highlights

Mellum2 Base is a long-context pretrained causal language model trained by JetBrains.

The model uses a Mixture-of-Experts architecture with 64 experts and activates 8 experts per token. It uses a combination of sliding-window and full attention layers, with a context length of 131,072 tokens.

This is the long-context base, produced from Mellum2-12B-A2.5B-Base-Pretrain by a layer-selective YaRN extension stage that re-maps RoPE frequencies on the global-attention layers only. It is the shared starting point for the released Instruct and Thinking variants.

Mellum2 Model Family

This repository contains one checkpoint from the Mellum2 family.

Checkpoint	Description
Base Pretrain	Base checkpoint before long-context extension
Base	Final base model
Instruct SFT	Supervised instruction-tuned checkpoint
Thinking SFT	Supervised thinking checkpoint
Instruct	RL-tuned instruction model
Thinking	RL-tuned thinking model

Model Overview

Mellum2 Base has the following features:

Number of Layers: 28
Hidden Size: 2304
Intermediate Size: 7168
MoE Intermediate Size: 896
Number of Experts: 64
Number of Activated Experts: 8
Number of Attention Heads (GQA): 32 for Q and 4 for KV
Context Length: 131,072
Sliding Window: 1,024
Vocabulary Size: 98,304
Precision: bfloat16

Serving with vLLM

vllm serve JetBrains/Mellum2-12B-A2.5B-Base --max-model-len 131072

Quickstart

Text-Only Input (base model — use the completions endpoint, not chat)

from openai import OpenAI
# Configured by environment variables
client = OpenAI()

completion = client.completions.create(
 model="JetBrains/Mellum2-12B-A2.5B-Base",
 prompt="def fibonacci(n):\n ",
 max_tokens=81920,
 temperature=0.6,
 top_p=0.95,
 extra_body={
 "top_k": 20,
 },
)
print("Completion:", completion)

Evaluation

Mellum2 Base pretraining results compared with similarly-sized open base models. All values are self-reported by JetBrains.

Benchmark	Mellum2 (12B-A2.5B)	OLMo-3 (7B)	Qwen2.5 (7B)	Qwen3 (4B)	Qwen3.5 (4B)
Code Generation
HumanEval	41.5	45.1	55.5	57.3	50.0
HumanEval+	37.2	39.6	47.0	51.2	43.9
MBPP	62.4	50.6	63.6	67.0	52.2
MBPP+	61.4	52.9	64.0	64.5	55.0
MultiPL-E (7 langs)	21.0	10.0	19.2	26.0	12.1
CRUXEval-I	45.4	38.8	44.0	44.6	49.1
CRUXEval-O	43.9	36.6	42.9	43.5	43.2
Knowledge & Reasoning
MMLU	70.9	62.1	71.8	71.1	74.2
MMLU-Pro	59.3	34.5	48.6	51.5	52.4
BBH	74.9	63.6	69.0	71.3	80.2
ARC-Challenge	53.5	53.6	51.3	51.2	54.9
HellaSwag	73.7	74.2	78.9	73.7	75.3
WinoGrande	65.5	69.5	73.3	71.2	70.8
TruthfulQA MC2	44.5	47.0	56.4	53.5	52.1
Math & Science
GSM8K	81.7	73.5	81.9	82.0	80.1
MATH	10.0	18.7	24.6	27.7	25.3
GPQA Diamond	31.3	28.8	32.8	36.9	41.4
GPQA Main	35.0	27.9	34.2	36.8	40.2

For more details, see the Mellum2 Technical Report.

License

Released under the Apache 2.0 license.

Downloads last month: 10,680

Safetensors

Model size

12B params

Tensor type

BF16

Model tree for JetBrains/Mellum2-12B-A2.5B-Base

Quantizations

3 models

Spaces using JetBrains/Mellum2-12B-A2.5B-Base 2

Collection including JetBrains/Mellum2-12B-A2.5B-Base

Mellum2 model weights • 6 items • Updated 18 days ago • 117

Paper for JetBrains/Mellum2-12B-A2.5B-Base

Paper • 2605.31268 • Published 21 days ago • 54

Article mentioning JetBrains/Mellum2-12B-A2.5B-Base

Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains

👁 Image

JetBrains

•

17 days ago

• 32

Evaluation results

openai/gsm8k · Gsm8k View evaluation results leaderboard
81.73 ^*
Idavidrein/gpqa leaderboard
Diamond View evaluation results
pre-training eval (pre-YaRN), no-tools
31.31 ^*
Main View evaluation results
pre-training eval (pre-YaRN), no-tools
35.04 ^*
TIGER-Lab/MMLU-Pro · Mmlu Pro View evaluation results leaderboard
59.31 ^*

pass@1 on HumanEval
self-reported
41.460
pass@1 on HumanEval+
self-reported
37.200
pass@1 on MBPP
self-reported
62.400

URL: https://huggingface.co/JetBrains/Mellum2-12B-A2.5B-Base

⇱ JetBrains/Mellum2-12B-A2.5B-Base · Hugging Face