Model Description

Themis is a tool-augmented preference model to address these limitations by empowering RMs with access to external environments, including calculators and search engines. It was introduced in the ICLR 2024 paper and first released in this repository. Themis-7b is trained with TARA, achieving a noteworthy overall improvement of 17.7% across eight tasks in preference ranking.

🔥 News

9 February, 2024: 🎉 We release the official codebase and model weights of ernie-research/Themis-7b. Stay tuned!🔥
16 January, 2024: 🎉 Our work has been accepted to ICLR 2024 spotlight! ✨

Citation

@inproceedings{tarm-2024-ernie,
 author = {Lei Li and
 Yekun Chai and
 Shuohuan Wang and
 Yu Sun and
 Hao Tian and
 Ningyu Zhang and
 Hua Wu},
 title = {Tool-Augmented Reward Modeling},
 booktitle = {The Twelfth International Conference on Learning Representations (ICLR)},
 year = {2024},
 url = {https://openreview.net/forum?id=d94x0gWTUX},
}

Downloads last month: 10

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train ernie-research/Themis-7b

Collection including ernie-research/Themis-7b

[ICLR'24 Spotlight] Tool-Augmented Reward Modeling • 3 items • Updated May 21, 2025 • 1

Paper for ernie-research/Themis-7b

Paper • 2310.01045 • Published Oct 2, 2023 • 4

URL: https://huggingface.co/ernie-research/Themis-7b