VOOZH about

URL: https://huggingface.co/OctoThinker/OctoThinker-3B-Long-Base

⇱ OctoThinker/OctoThinker-3B-Long-Base · Hugging Face


OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling

OctoThinker-3B-Long-Base

The OctoThinker family is built on carefully studied mid-training insights, starting from the Llama-3 family, to create a reinforcement learning–friendly base language model.

Training Recipe

Evaluation Results

Note that we adopt the few-shot prompting evaluation for these base language models.

More about OctoThinker

Citation

Check out our paper for more details. If you use our models, datasets or find our work useful, please cite

@article{wang2025octothinker,
 title={OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling},
 author={Wang, Zengzhi and Zhou, Fan and Li, Xuefeng and Liu, Pengfei},
 year={2025},
 journal={arXiv preprint arXiv:2506.20512},
 note={Preprint}
}
Downloads last month
289
Safetensors
Model size
3B params
Tensor type
BF16
·

Model tree for OctoThinker/OctoThinker-3B-Long-Base

Finetuned
(460)
this model
Quantizations
2 models

Datasets used to train OctoThinker/OctoThinker-3B-Long-Base

Collection including OctoThinker/OctoThinker-3B-Long-Base

Paper for OctoThinker/OctoThinker-3B-Long-Base