This is a model released for our paper: REBEL: Reinforcement Learning via Regressing Relative Rewards.

REBEL-Llama-3-Armo-iter_1

This model is developed with REBEL based on Meta-Llama-3-8B-Instruct with ArmoRM-Llama3-8B-v0.1 as the reward model and UltraFeedback dataset. The training code is available at https://github.com/ZhaolinGao/REBEL. We collect offline generations of the entire dataset with best-of-5 as the chosen response and worst-of-5 as the rejected response (Ultrafeedback-Llama-3-Armo-iter_1).

Links to Other Model

REBEL-OpenChat-3.5

REBEL-Llama-3

REBEL-Llama-3-epoch_2

REBEL-Llama-3-Armo-iter_2

REBEL-Llama-3-Armo-iter_3

Evaluations

Model	AlpacaEval 2.0 LC Win Rate	AlpacaEval 2.0 Win Rate	MT-Bench Average	MMLU (5-shot)	GSM8K (5-shot)
REBEL-OpenChat-3.5	17.3	12.8	8.06	63.7	68.8
REBEL-Llama-3	30.1	32.6	8.16	65.8	75.6
REBEL-Llama-3-epoch_2	31.3	34.2	7.83	65.4	75.4
REBEL-Llama-3-Armo-iter_1	48.3	41.8	8.13	66.3	75.8
REBEL-Llama-3-Armo-iter_2	50.0	48.5	8.07	65.9	75.4
REBEL-Llama-3-Armo-iter_3	49.7	48.1	8.01	66.0	75.7

Citation

Please cite our paper if you use this model in your own work:

@misc{gao2024rebel,
 title={REBEL: Reinforcement Learning via Regressing Relative Rewards}, 
 author={Zhaolin Gao and Jonathan D. Chang and Wenhao Zhan and Owen Oertell and Gokul Swamy and Kianté Brantley and Thorsten Joachims and J. Andrew Bagnell and Jason D. Lee and Wen Sun},
 year={2024},
 eprint={2404.16767},
 archivePrefix={arXiv},
 primaryClass={cs.LG}
}

Downloads last month: 4

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Cornell-AGI/REBEL-Llama-3-Armo-iter_1

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Finetuned

(1124)

this model

Dataset used to train Cornell-AGI/REBEL-Llama-3-Armo-iter_1

Collection including Cornell-AGI/REBEL-Llama-3-Armo-iter_1

10 items • Updated Sep 2, 2024 • 1

Paper for Cornell-AGI/REBEL-Llama-3-Armo-iter_1

Paper • 2404.16767 • Published Apr 25, 2024 • 2

URL: https://huggingface.co/Cornell-AGI/REBEL-Llama-3-Armo-iter_1

⇱ Cornell-AGI/REBEL-Llama-3-Armo-iter_1 · Hugging Face