Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks

This repository contains model checkpoints from the paper Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks.

For more details, including code and evaluation procedures, please refer to the official GitHub repository: https://github.com/rioyokotalab/optimal-sparsity

How to cite

If you find our work helpful, please feel free to cite the paper.

@inproceedings{
 nakamura2026optimal,
 title={Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks},
 author={Taishi Nakamura and Satoki Ishikawa and Masaki Kawamura and Takumi Okamoto and Daisuke Nohara and Jun Suzuki and Rio Yokota},
 booktitle={The Fourteenth International Conference on Learning Representations},
 year={2026},
 url={https://openreview.net/forum?id=XFw2EPRUUR}
}

Downloads last month: 3

Safetensors

Model size

52B params

Tensor type

BF16

Collection including llm-jp/optimal-sparsity-code-d2048-E128-k4-52.2B-A2.3B

Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks • 65 items • Updated Aug 21, 2025 • 1

Paper for llm-jp/optimal-sparsity-code-d2048-E128-k4-52.2B-A2.3B

Paper • 2508.18672 • Published Aug 26, 2025 • 10

URL: https://huggingface.co/llm-jp/optimal-sparsity-code-d2048-E128-k4-52.2B-A2.3B

⇱ llm-jp/optimal-sparsity-code-d2048-E128-k4-52.2B-A2.3B · Hugging Face

Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks

How to cite

Collection including llm-jp/optimal-sparsity-code-d2048-E128-k4-52.2B-A2.3B

Paper for llm-jp/optimal-sparsity-code-d2048-E128-k4-52.2B-A2.3B