VOOZH about

URL: https://huggingface.co/sfairXC/FsfairX-Gemma2-RM-v0.1

⇱ sfairXC/FsfairX-Gemma2-RM-v0.1 · Hugging Face


YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

This is a Vanilla BT based Reward model based on Gemma-2-9B. The recipes are from RLHF Workflow.

We have the reward-bench result:

Chat: 98.04

Chat Hard: 65.35

Safety: 89.54

Reasoning: 92.31

Please refer to

@misc{dong2024rlhf,
 title={RLHF Workflow: From Reward Modeling to Online RLHF}, 
 author={Hanze Dong and Wei Xiong and Bo Pang and Haoxiang Wang and Han Zhao and Yingbo Zhou and Nan Jiang and Doyen Sahoo and Caiming Xiong and Tong Zhang},
 year={2024},
 eprint={2405.07863},
 archivePrefix={arXiv},
 primaryClass={cs.LG}
}
Downloads last month
11
Safetensors
Model size
9B params
Tensor type
BF16
·

Model tree for sfairXC/FsfairX-Gemma2-RM-v0.1

Adapters
10 models
Finetunes
1 model

Paper for sfairXC/FsfairX-Gemma2-RM-v0.1