🔥 Chat with Magpie Here!

🐦 Llama-3-8B-Magpie-Align-v0.3

Project Web: https://magpie-align.github.io/

Online Model Demo: https://huggingface.co/spaces/flydust/Chat-with-Magpie

Arxiv Technical Report: https://arxiv.org/abs/2406.08464

Codes: https://github.com/magpie-align/magpie

🧐 About This Model

This model is an aligned version of meta-llama/Meta-Llama-3-8B. We apply the following pipeline:

We first perform SFT using:

Magpie-Align/Magpie-Pro-MT-300K-v0.1
Magpie-Align/Magpie-Reasoning-150K
Magpie-Align/Magpie-Qwen2-Pro-200K-Chinese
SFT Model Checkpoint: Magpie-Align/Llama-3-8B-Magpie-Align-SFT-v0.3

We then perform DPO on the princeton-nlp/llama3-ultrafeedback-armorm dataset.

The overall performance is much better than the official Llama-3-8B-Instruct Model! Plus, it can answer Chinese queries frequently, thanks to our new Chinese instruction dataset!

Alpaca Eval 2 (vs GPT-4-Turbo-1106): 48.58 (LC), 50.36 (WR)
Alpaca Eval 2 (vs Llama-3-8B-Instruct): 73.65 (LC), 75.81 (WR)
Arena Hard: 42.2
WildBench WB-Score: 41.1
Zero-Eval GSM: 50.0

🔥 Model Performance

We compare our Llama-3-8B-Magpie-Align with official and other open-aligned LLMs that have been fine-tuned from base models and have publicly released their training datasets. The results are as follows:

+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
| Aligned Model ID | MT-Bench | | | Alpaca Eval 2 | | Alpaca Eval 2 | | Arena Hard |
| | | | | (GPT-4-Turbo-1106) | | (Llama-3-8B-Instruct) | | |
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
| | R1 | R2 | AVG | LC WR | WR | LC WR | WR | Score |
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
| meta-llama/Meta-Llama-3-8B-Instruct | 8.31 | 7.65 | 7.98 | 22.92 | 22.57 | 50 | 50 | 20.6 |
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
| princeton-nlp/Llama-3-Base-8B-SFT-DPO | 8.12 | 7.23 | 7.67 | 17.71 | 15.34 | 43.73 | 38.80 | 14.8 |
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
| NousResearch/Hermes-2-Pro-Llama-3-8B | 8.05 | 7.35 | 7.70 | 15.60 | 12.86 | 36.37 | 30.52 | 11.5 |
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
| allenai/llama-3-tulu-2-dpo-8b | 7.71 | 7.15 | 7.43 | 14.89 | 14.8 | 35.43 | 35.42 | 11.7 |
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
| cognitivecomputations/dolphin-2.9-llama3-8b | 7.97 | 6.98 | 7.47 | 12.50 | 8.79 | 32.67 | 22.80 | 8.2 |
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
| openchat/openchat-3.6-8b-20240522 | 7.83 | 7.23 | 7.53 | 17.70 | 12.53 | 41.30 | 30.79 | 6.7 |
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
| Magpie-Align/Llama-3-8B-Magpie-Align-v0.1 | 8.01 | 7.63 | 7.82 | 38.52 | 38.47 | 69.37 | 70.05 | 32.4 |
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
| Magpie-Align/Llama-3-8B-Magpie-Align-v0.2 | 7.81 | 7.64 | 7.73 | 49.86 | 51.98 | 75.17 | 78.20 | 37.5 |
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
| Magpie-Align/Llama-3-8B-Magpie-Align-v0.3 | 7.82 | 7.51 | 7.67 | 48.58 | 50.36 | 73.65 | 75.81 | 42.2 |
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+

👀 Other Information

License: Please follow Meta Llama 3 Community License.

Conversation Template: Please use Llama 3 official chat template for the best performance.

How to use it? Please check the official Llama 3 repository for detailed instructions. Simply replace the original model_id with Magpie-Align/Llama-3-8B-Magpie-Align-SFT-v1.0.

The detailed training pipeline is as follows.

Stage 1: Supervised Fine-tuning

We use Axolotl for SFT.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 32
total_train_batch_size: 128
total_eval_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 98
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss
0.8616	0.0019	1	0.8870
0.5554	0.2013	106	0.5568
0.5067	0.4027	212	0.5065
0.4728	0.6040	318	0.4865
0.4681	0.8054	424	0.4740
0.4563	1.0067	530	0.4662
0.4115	1.1944	636	0.4642
0.3993	1.3957	742	0.4620
0.4048	1.5971	848	0.4613
0.4167	1.7984	954	0.4611

Framework versions

Transformers 4.42.3
Pytorch 2.3.1+cu121
Datasets 2.19.1
Tokenizers 0.19.1

Internal name for identification: Llama-3-8B-Magpie-Mix-RC. Please change the model name in the below Axolotl config.

👁 Built with Axolotl

Stage 2: Direct Preference Optimization

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 7e-07
train_batch_size: 2
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 8
total_train_batch_size: 128
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.5813	0.2137	100	0.5238	-2.6816	-3.4539	0.7298	0.7723	-612.4234	-541.2933	-1.1244	-1.1082
0.5021	0.4275	200	0.4483	-3.4053	-4.4858	0.8024	1.0805	-715.6146	-613.6641	-1.1035	-1.0844
0.3802	0.6412	300	0.4069	-3.7974	-5.1705	0.8427	1.3731	-784.0882	-652.8716	-1.1310	-1.1105
0.3827	0.8549	400	0.3872	-4.3693	-5.9670	0.8710	1.5976	-863.7308	-710.0647	-1.1495	-1.1283

Framework versions

Transformers 4.42.3
Pytorch 2.3.1+cu121
Datasets 2.20.0
Tokenizers 0.19.1

Downstream Performance (Lighteval)

Datasets	Llama-3-8B-Magpie-Align-v0.3
MMLU (5)	65.69
ARC (25)	63.23
HellaSwag (25)	82.15
TruthfulQA (0)	60.97
Winogrande (5)	73.64

Paper Abstract

📚 Citation

If you find the model, data, or code useful, please cite our paper:

@article{xu2024magpie,
 title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
 author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
 year={2024},
 eprint={2406.08464},
 archivePrefix={arXiv},
 primaryClass={cs.CL}
}

Please also cite the creators of preference datasets:

SimPO paper:

@article{meng2024simpo,
 title={{SimPO}: Simple preference optimization with a reference-free reward},
 author={Meng, Yu and Xia, Mengzhou and Chen, Danqi},
 journal={arXiv preprint arXiv:2405.14734},
 year={2024}
}

UltraFeedback paper:

@article{cui2023ultrafeedback,
 title={{UltraFeedback}: Boosting language models with high-quality feedback},
 author={Cui, Ganqu and Yuan, Lifan and Ding, Ning and Yao, Guanming and Zhu, Wei and Ni, Yuan and Xie, Guotong and Liu, Zhiyuan and Sun, Maosong},
 journal={arXiv preprint arXiv:2310.01377},
 year={2023}
}

ArmoRM paper:

@article{wang2024interpretable,
 title={Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts},
 author={Wang, Haoxiang and Xiong, Wei and Xie, Tengyang and Zhao, Han and Zhang, Tong},
 journal={arXiv preprint arXiv:2406.12845},
 year={2024}
}

Questions? Please contact Zhangchen by email.