👁 Youtu-LLM Logo

📃 License • 💻 Code • 📑 Technical Report • 📊 Benchmarks

🎯 Brief Introduction

Youtu-LLM is a new, small, yet powerful LLM, contains only 1.96B parameters, supports 128k long context, and has native agentic talents. On general evaluations, Youtu-LLM significantly outperforms SOTA LLMs of similar size in terms of Commonsense, STEM, Coding and Long Context capabilities; in agent-related testing, Youtu-LLM surpasses larger-sized leaders and is truly capable of completing multiple end2end agent tasks.

Youtu-LLM has the following features:

Type: Autoregressive Causal Language Models with Dense MLA
Release versions: Base and Instruct
Number of Parameters: 1.96B
Number of Layers: 32
Number of Attention Heads (MLA): 16 for Q/K/V
MLA Rank: 1,536 for Q, 512 for K/V
MLA Dim: 128 for QK Nope, 64 for QK Rope, and 128 for V
Context Length: 131,072
Vocabulary Size: 128,256

🤗 Model Download

Model Name	Description	Download
Youtu-LLM-2B-Base	Base model of Youtu-LLM-2B	🤗 Model
Youtu-LLM-2B	Instruct model of Youtu-LLM-2B	🤗 Model
Youtu-LLM-2B-GGUF	Instruct model of Youtu-LLM-2B, in GGUF format	🤗 Model

📰 News

[2026.01.28] You can now directly use Youtu-LLM with Transformers>=5.1.0.
[2026.01.07] You can now fine-tuning Youtu-LLM with ModelScope.
[2026.01.04] You can now fine-tuning Youtu-LLM with LlamaFactory.

Note: If you wish to use Youtu-LLM-2B-Base based on earlier versions of transformers (>=4.56.0,<=4.57.1), please make sure to download the model repository before this commit.

📊 Performance Comparisons

Base Model

👁 Comparison between Youtu-LLM-2B-Base and baselines

General Benchmarks

Type	Benchmark (Metric)	# Shots	Qwen3-1.7B-Base	SmoLM3-3B-Base	Gemma3-4B-Base	Qwen3-4B-Base	Llama3.1-8B
Commonsense	MMLU-Pro (EM)	5	34.9%	35.3%	29.4%	36.2%	48.4%
MLQA-Zh (EM)	3	38.1%	38.0%	40.3%	47.2%	43.0%
MMLU-ProX-Zh (EM)	5	32.5%	26.7%	24.2%	45.2%	25.4%
STEM	GSM8K (EM)	8	68.2%	67.3%	38.5%	80.8%	47.8%
MGSM-Zh (EM)	8	57.1%	40.7%	33.0%	69.7%	35.9%
MATH (EM)	4	28.1%	40.8%	24.4%	44.8%	21.5%
BBH (EM)	3	53.0%	59.8%	51.6%	70.8%	59.8%
GPQA-MC (Acc. Norm)	5	30.4%	26.6%	28.6%	37.8%	30.1%
HLE-MC (Acc. Norm)	3	10.7%	3.1%	8.0%	11.5%	17.4%
Coding	MBPP (Pass@1)	3	55.6%	51.0%	45.8%	67.5%	49.4%
MBPP+ (Pass@1)	3	71.0%	66.1%	61.9%	62.7%	81.8%
HumanEval (Pass@1)	0	49.9%	34.8%	36.6%	36.0%	64.6%
HumanEval+ (Pass@1)	0	41.3%	28.1%	28.1%	28.1%	57.3%
LiveCodeBench v6 (Pass@1)	3	5.1%	2.9%	2.9%	3.4%	9.7%
CRUXEval (Pass@1)	1	40.6%	42.1%	39.7%	42.3%	55.9%
RepoBench (EM)	3	21.0%	21.8%	23.0%	25.3%	22.7%
Long Context	LongBench v2 (Acc.)	3	28.8%	26.6%	25.8%	27.8%	27.2%
NIAH (Acc.)	/	79.8%	75.0%	83.0%	99.8%	98.8%

Agentic Benchmarks

We takes APTBench for evaluating the agentic capabilities of base model.

Category	Qwen3-1.7B-Base	SmoLM3-3B-Base	Gemma3-4B-Base	Qwen3-4B-Base	Llama3.1-8B
Code	25.1%	24.3%	32.8%	41.9%	23.6%
Deep Research	28.5%	27.2%	36.4%	40.5%	30.0%
Math	59.9%	60.7%	59.8%	70.5%	60.1%
Tool	56.7%	59.1%	61.7%	65.8%	64.1%

📚 Citation

If you find our work useful in your research, please consider citing the following paper:

@article{youtu-llm,
 title={Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models},
 author={Tencent Youtu Lab},
 year={2025},
 eprint={2512.24618},
 archivePrefix={arXiv},
 primaryClass={cs.CL},
 url={https://arxiv.org/abs/2512.24618}, 
}

Downloads last month: 887

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for tencent/Youtu-LLM-2B-Base

Finetunes

1 model

Quantizations

3 models

Collection including tencent/Youtu-LLM-2B-Base

13 items • Updated Apr 22 • 26

Paper for tencent/Youtu-LLM-2B-Base

Paper • 2512.24618 • Published Dec 31, 2025 • 155

URL: https://huggingface.co/tencent/Youtu-LLM-2B-Base

⇱ tencent/Youtu-LLM-2B-Base · Hugging Face