🔥 Chat with Magpie Here!
🐦 Llama-3-8B-Magpie-Align-v0.3
Project Web: https://magpie-align.github.io/
Online Model Demo: https://huggingface.co/spaces/flydust/Chat-with-Magpie
Arxiv Technical Report: https://arxiv.org/abs/2406.08464
Codes: https://github.com/magpie-align/magpie
🧐 About This Model
This model is an aligned version of meta-llama/Meta-Llama-3-8B. We apply the following pipeline:
We first perform SFT using:
- Magpie-Align/Magpie-Pro-MT-300K-v0.1
- Magpie-Align/Magpie-Reasoning-150K
- Magpie-Align/Magpie-Qwen2-Pro-200K-Chinese
- SFT Model Checkpoint: Magpie-Align/Llama-3-8B-Magpie-Align-SFT-v0.3
We then perform DPO on the princeton-nlp/llama3-ultrafeedback-armorm dataset.
The overall performance is much better than the official Llama-3-8B-Instruct Model! Plus, it can answer Chinese queries frequently, thanks to our new Chinese instruction dataset!
- Alpaca Eval 2 (vs GPT-4-Turbo-1106): 48.58 (LC), 50.36 (WR)
- Alpaca Eval 2 (vs Llama-3-8B-Instruct): 73.65 (LC), 75.81 (WR)
- Arena Hard: 42.2
- WildBench WB-Score: 41.1
- Zero-Eval GSM: 50.0
🔥 Model Performance
We compare our Llama-3-8B-Magpie-Align with official and other open-aligned LLMs that have been fine-tuned from base models and have publicly released their training datasets. The results are as follows:
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
| Aligned Model ID | MT-Bench | | | Alpaca Eval 2 | | Alpaca Eval 2 | | Arena Hard |
| | | | | (GPT-4-Turbo-1106) | | (Llama-3-8B-Instruct) | | |
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
| | R1 | R2 | AVG | LC WR | WR | LC WR | WR | Score |
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
| meta-llama/Meta-Llama-3-8B-Instruct | 8.31 | 7.65 | 7.98 | 22.92 | 22.57 | 50 | 50 | 20.6 |
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
| princeton-nlp/Llama-3-Base-8B-SFT-DPO | 8.12 | 7.23 | 7.67 | 17.71 | 15.34 | 43.73 | 38.80 | 14.8 |
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
| NousResearch/Hermes-2-Pro-Llama-3-8B | 8.05 | 7.35 | 7.70 | 15.60 | 12.86 | 36.37 | 30.52 | 11.5 |
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
| allenai/llama-3-tulu-2-dpo-8b | 7.71 | 7.15 | 7.43 | 14.89 | 14.8 | 35.43 | 35.42 | 11.7 |
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
| cognitivecomputations/dolphin-2.9-llama3-8b | 7.97 | 6.98 | 7.47 | 12.50 | 8.79 | 32.67 | 22.80 | 8.2 |
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
| openchat/openchat-3.6-8b-20240522 | 7.83 | 7.23 | 7.53 | 17.70 | 12.53 | 41.30 | 30.79 | 6.7 |
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
| Magpie-Align/Llama-3-8B-Magpie-Align-v0.1 | 8.01 | 7.63 | 7.82 | 38.52 | 38.47 | 69.37 | 70.05 | 32.4 |
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
| Magpie-Align/Llama-3-8B-Magpie-Align-v0.2 | 7.81 | 7.64 | 7.73 | 49.86 | 51.98 | 75.17 | 78.20 | 37.5 |
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
| Magpie-Align/Llama-3-8B-Magpie-Align-v0.3 | 7.82 | 7.51 | 7.67 | 48.58 | 50.36 | 73.65 | 75.81 | 42.2 |
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
👀 Other Information
License: Please follow Meta Llama 3 Community License.
Conversation Template: Please use Llama 3 official chat template for the best performance.
How to use it? Please check the official Llama 3 repository for detailed instructions. Simply replace the original model_id with Magpie-Align/Llama-3-8B-Magpie-Align-SFT-v1.0.
The detailed training pipeline is as follows.
Stage 1: Supervised Fine-tuning
We use Axolotl for SFT.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- total_eval_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 98
- num_epochs: 2
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 0.8616 | 0.0019 | 1 | 0.8870 |
| 0.5554 | 0.2013 | 106 | 0.5568 |
| 0.5067 | 0.4027 | 212 | 0.5065 |
| 0.4728 | 0.6040 | 318 | 0.4865 |
| 0.4681 | 0.8054 | 424 | 0.4740 |
| 0.4563 | 1.0067 | 530 | 0.4662 |
| 0.4115 | 1.1944 | 636 | 0.4642 |
| 0.3993 | 1.3957 | 742 | 0.4620 |
| 0.4048 | 1.5971 | 848 | 0.4613 |
| 0.4167 | 1.7984 | 954 | 0.4611 |
Framework versions
- Transformers 4.42.3
- Pytorch 2.3.1+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1
Internal name for identification: Llama-3-8B-Magpie-Mix-RC. Please change the model name in the below Axolotl config.
Stage 2: Direct Preference Optimization
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 7e-07
- train_batch_size: 2
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 8
- total_train_batch_size: 128
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.5813 | 0.2137 | 100 | 0.5238 | -2.6816 | -3.4539 | 0.7298 | 0.7723 | -612.4234 | -541.2933 | -1.1244 | -1.1082 |
| 0.5021 | 0.4275 | 200 | 0.4483 | -3.4053 | -4.4858 | 0.8024 | 1.0805 | -715.6146 | -613.6641 | -1.1035 | -1.0844 |
| 0.3802 | 0.6412 | 300 | 0.4069 | -3.7974 | -5.1705 | 0.8427 | 1.3731 | -784.0882 | -652.8716 | -1.1310 | -1.1105 |
| 0.3827 | 0.8549 | 400 | 0.3872 | -4.3693 | -5.9670 | 0.8710 | 1.5976 | -863.7308 | -710.0647 | -1.1495 | -1.1283 |
Framework versions
- Transformers 4.42.3
- Pytorch 2.3.1+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
Downstream Performance (Lighteval)
| Datasets | Llama-3-8B-Magpie-Align-v0.3 |
|---|---|
| MMLU (5) | 65.69 |
| ARC (25) | 63.23 |
| HellaSwag (25) | 82.15 |
| TruthfulQA (0) | 60.97 |
| Winogrande (5) | 73.64 |
Paper Abstract
📚 Citation
If you find the model, data, or code useful, please cite our paper:
@article{xu2024magpie,
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Please also cite the creators of preference datasets:
SimPO paper:
@article{meng2024simpo,
title={{SimPO}: Simple preference optimization with a reference-free reward},
author={Meng, Yu and Xia, Mengzhou and Chen, Danqi},
journal={arXiv preprint arXiv:2405.14734},
year={2024}
}
UltraFeedback paper:
@article{cui2023ultrafeedback,
title={{UltraFeedback}: Boosting language models with high-quality feedback},
author={Cui, Ganqu and Yuan, Lifan and Ding, Ning and Yao, Guanming and Zhu, Wei and Ni, Yuan and Xie, Guotong and Liu, Zhiyuan and Sun, Maosong},
journal={arXiv preprint arXiv:2310.01377},
year={2023}
}
ArmoRM paper:
@article{wang2024interpretable,
title={Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts},
author={Wang, Haoxiang and Xiong, Wei and Xie, Tengyang and Zhao, Han and Zhang, Tong},
journal={arXiv preprint arXiv:2406.12845},
year={2024}
}
Questions? Please contact Zhangchen by email.
- Downloads last month
- 7,141
Model tree for Magpie-Align/Llama-3-8B-Magpie-Align-v0.3
Base model
meta-llama/Meta-Llama-3-8B