๐ฅ Chat with Magpie Here!
๐ฆ Llama-3.1-8B-Magpie-Align-v0.2
Project Web: https://magpie-align.github.io/
Online Model Demo: https://huggingface.co/spaces/flydust/Chat-with-Magpie
Arxiv Technical Report: https://arxiv.org/abs/2406.08464
Codes: https://github.com/magpie-align/magpie
๐ง About This Model
This model is an aligned version of meta-llama/Meta-Llama-3.1-8B. We apply the following pipeline:
We first perform SFT using:
- Magpie-Align/Magpie-Llama-3.1-Pro-500K-Filtered
- Magpie-Align/Magpie-Reasoning-150K
- SFT Model Checkpoint: Magpie-Align/Llama-3.1-8B-Magpie-Align-SFT-v0.2
We then perform DPO on the Magpie-Align/Llama-3.1-70B-PO-100K-armorm dataset.
The overall performance is much better than the official Llama-3.1-8B-Instruct Model!
- Alpaca Eval 2 (vs GPT-4-Turbo-1106): 46.68 (LC), 53.42 (WR)
- Arena Hard: 43.2
๐ Other Information
License: Please follow Meta Llama 3.1 Community License.
Conversation Template: Please use Llama 3 official chat template for the best performance.
How to use it? Please check the official Llama 3.1 repository for detailed instructions. Simply replace the original model_id with this model id.
Alignment Pipeline
The detailed alignment pipeline is as follows.
Stage 1: Supervised Fine-tuning
We use Axolotl for SFT. Please refer to the model card of SFT checkpoint for detailed configurations.
Stage 2: Direct Preference Optimization
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 2
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.5603 | 0.1306 | 100 | 0.5762 | -1.0828 | -1.5526 | 0.7620 | 0.4698 | -505.3885 | -452.3145 | -0.7241 | -0.7285 |
| 0.5441 | 0.2612 | 200 | 0.4445 | -3.4116 | -5.1002 | 0.8360 | 1.6886 | -860.1481 | -685.1905 | -0.6966 | -0.6964 |
| 0.3586 | 0.3919 | 300 | 0.3949 | -3.4100 | -5.2798 | 0.8720 | 1.8698 | -878.1118 | -685.0309 | -0.7677 | -0.7653 |
| 0.3737 | 0.5225 | 400 | 0.3653 | -4.3580 | -6.6737 | 0.8760 | 2.3157 | -1017.5 | -779.8291 | -0.7777 | -0.7711 |
| 0.2611 | 0.6531 | 500 | 0.3457 | -4.9017 | -7.6712 | 0.8860 | 2.7695 | -1117.2515 | -834.2015 | -0.8137 | -0.8074 |
| 0.3342 | 0.7837 | 600 | 0.3354 | -4.7041 | -7.3342 | 0.8920 | 2.6301 | -1083.5503 | -814.4402 | -0.8081 | -0.7999 |
| 0.3251 | 0.9144 | 700 | 0.3335 | -4.8366 | -7.5394 | 0.8880 | 2.7028 | -1104.0730 | -827.6954 | -0.8119 | -0.8042 |
Framework versions
- Transformers 4.43.3
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
Paper Abstract
๐ Citation
If you find the model, data, or code useful, please cite our paper:
@article{xu2024magpie,
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
year={2024},
eprint={2406.08464},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Please also cite the reward model for creating preference datasets:
ArmoRM paper:
@article{wang2024interpretable,
title={Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts},
author={Wang, Haoxiang and Xiong, Wei and Xie, Tengyang and Zhao, Han and Zhang, Tong},
journal={arXiv preprint arXiv:2406.12845},
year={2024}
}
Questions? Please contact Zhangchen by email.
- Downloads last month
- 8
Model tree for Magpie-Align/Llama-3.1-8B-Magpie-Align-v0.2
Base model
meta-llama/Llama-3.1-8B