VOOZH about

URL: https://huggingface.co/ernie-research/Themis-7b

โ‡ฑ ernie-research/Themis-7b ยท Hugging Face


๐Ÿ‘ ICLR 2024

Offical checkpoint for Tool-Augmented Reward Modeling (ICLR 2024 spotlight).

Model Description

Themis is a tool-augmented preference model to address these limitations by empowering RMs with access to external environments, including calculators and search engines. It was introduced in the ICLR 2024 paper and first released in this repository. Themis-7b is trained with TARA, achieving a noteworthy overall improvement of 17.7% across eight tasks in preference ranking.

๐Ÿ”ฅ News

  • 9 February, 2024: ๐ŸŽ‰ We release the official codebase and model weights of ernie-research/Themis-7b. Stay tuned!๐Ÿ”ฅ
  • 16 January, 2024: ๐ŸŽ‰ Our work has been accepted to ICLR 2024 spotlight! โœจ

Citation

@inproceedings{tarm-2024-ernie,
 author = {Lei Li and
 Yekun Chai and
 Shuohuan Wang and
 Yu Sun and
 Hao Tian and
 Ningyu Zhang and
 Hua Wu},
 title = {Tool-Augmented Reward Modeling},
 booktitle = {The Twelfth International Conference on Learning Representations (ICLR)},
 year = {2024},
 url = {https://openreview.net/forum?id=d94x0gWTUX},
}
Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train ernie-research/Themis-7b

Collection including ernie-research/Themis-7b

Paper for ernie-research/Themis-7b