Voozh

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

This is a Vanilla BT based Reward model based on Gemma-2-9B. The recipes are from RLHF Workflow.

We have the reward-bench result:

Chat: 98.04

Chat Hard: 65.35

Safety: 89.54

Reasoning: 92.31

Please refer to

@misc{dong2024rlhf,
 title={RLHF Workflow: From Reward Modeling to Online RLHF}, 
 author={Hanze Dong and Wei Xiong and Bo Pang and Haoxiang Wang and Han Zhao and Yingbo Zhou and Nan Jiang and Doyen Sahoo and Caiming Xiong and Tong Zhang},
 year={2024},
 eprint={2405.07863},
 archivePrefix={arXiv},
 primaryClass={cs.LG}
}

Downloads last month: 11

Safetensors

Model size

9B params

Tensor type

BF16

Model tree for sfairXC/FsfairX-Gemma2-RM-v0.1

Adapters

10 models

Finetunes

1 model

Paper for sfairXC/FsfairX-Gemma2-RM-v0.1

Paper • 2405.07863 • Published May 13, 2024 • 71

URL: https://huggingface.co/sfairXC/FsfairX-Gemma2-RM-v0.1

⇱ sfairXC/FsfairX-Gemma2-RM-v0.1 · Hugging Face

Model tree for sfairXC/FsfairX-Gemma2-RM-v0.1

Paper for sfairXC/FsfairX-Gemma2-RM-v0.1