Voozh

This is the model is trained using paper, M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models.

Model	AIME 2025	AIME 2024	MATH 500	AMC 2023	OlympiadBench
Qwen2.5-Math-7B-Instruct (Transformer)	–	13.3	79.8	50.6	40.7
rStar-Math-7B (Transformer)	–	26.7	78.4	47.5	47.1
Eurus-2-7B-PRIME (Transformer)	–	26.7	79.2	57.8	42.1
Qwen2.5-7B-SimpleRL (Transformer)	–	26.7	82.4	62.5	43.3
DeepSeek-R1-Distill-Qwen-1.5B (Transformer)	23.0	28.8	82.8	62.9	43.3
M1-3B (Mamba Hybrid Models)	23.5	28.5	84.0	62.8	47.3

Code: https://github.com/jxiw/M1

@article{wang2025m1scalabletesttimecompute,
 title={M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models}, 
 author={Junxiong Wang and Wen-Ding Li and Daniele Paliotta and Daniel Ritter and Alexander M. Rush and Tri Dao},
 journal={arXiv preprint arXiv:2504.10449},
 year={2025},
 url={https://arxiv.org/abs/2504.10449}, 
}

Downloads last month: 11

Safetensors

Model size

3B params

Tensor type

BF16

Paper for togethercomputer/M1-3B

Paper • 2504.10449 • Published Apr 14, 2025 • 15

URL: https://huggingface.co/togethercomputer/M1-3B

⇱ togethercomputer/M1-3B · Hugging Face

Paper for togethercomputer/M1-3B