VOOZH about

URL: https://huggingface.co/Magpie-Align/Llama-3-8B-Magpie-Align-v0.1

⇱ Magpie-Align/Llama-3-8B-Magpie-Align-v0.1 · Hugging Face


👁 Magpie

🔥 Chat with Magpie Here!

🐦 Llama-3-8B-Magpie-Align-v0.1

Project Web: https://magpie-align.github.io/

Online Model Demo: https://huggingface.co/spaces/flydust/Chat-with-Magpie

Arxiv Technical Report: https://arxiv.org/abs/2406.08464

Codes: https://github.com/magpie-align/magpie

Model Overview

This model is an aligned version of meta-llama/Meta-Llama-3-8B. We apply the following pipeline:

The overall performance is even better than the official Llama-3-8B-Instruct Model!

  • Alpaca Eval 2 (vs GPT-4-Turbo-1106): 38.52 (LC), 38.47 (WR)
  • Alpaca Eval 2 (vs Llama-3-8B-Instruct): 69.37 (LC), 70.05 (WR)
  • Arena Hard: 32.4
  • WildBench: 39.3 ((was) Best <30B Model! 🏆)
  • Zero-Eval GSM: 54.62

Model Performance

We compare our Llama-3-8B-Magpie-Align with official and other open-aligned LLMs that have been fine-tuned from base models and have publicly released their training datasets. The results are as follows:

+---------------------------------------------+--------------------+--------------------+-----------------------+------------+
| Aligned Model ID | MT-Bench | Alpaca Eval 2 | Alpaca Eval 2 | Arena Hard |
| | | (GPT-4-Turbo-1106) | (Llama-3-8B-Instruct) | |
+---------------------------------------------+------+------+------+----------+---------+-----------+-----------+------------+
| | R1 | R2 | AVG | LC WR | WR | LC WR | WR | Score |
+---------------------------------------------+------+------+------+----------+---------+-----------+-----------+------------+
| meta-llama/Meta-Llama-3-8B-Instruct | 8.31 | 7.65 | 7.98 | 22.92 | 22.57 | 50 | 50 | 20.6 |
+---------------------------------------------+------+------+------+----------+---------+-----------+-----------+------------+
| princeton-nlp/Llama-3-Base-8B-SFT-DPO | 8.12 | 7.23 | 7.67 | 17.71 | 15.34 | 43.73 | 38.80 | 14.8 |
+---------------------------------------------+------+------+------+----------+---------+-----------+-----------+------------+
| NousResearch/Hermes-2-Pro-Llama-3-8B | 8.05 | 7.35 | 7.70 | 15.60 | 12.86 | 36.37 | 30.52 | 11.5 |
+---------------------------------------------+------+------+------+----------+---------+-----------+-----------+------------+
| allenai/llama-3-tulu-2-dpo-8b | 7.71 | 7.15 | 7.43 | 14.89 | 14.80 | 35.43 | 35.42 | 11.7 |
+---------------------------------------------+------+------+------+----------+---------+-----------+-----------+------------+
| cognitivecomputations/dolphin-2.9-llama3-8b | 7.97 | 6.98 | 7.47 | 12.50 | 8.79 | 32.67 | 22.80 | 8.2 |
+---------------------------------------------+------+------+------+----------+---------+-----------+-----------+------------+
| openchat/openchat-3.6-8b-20240522 | 7.83 | 7.23 | 7.53 | 17.70 | 12.53 | 41.30 | 30.79 | 6.7 |
+---------------------------------------------+------+------+------+----------+---------+-----------+-----------+------------+
| Magpie-Align/Llama-3-8B-Magpie-Align-v0.1 | 8.01 | 7.63 | 7.82 | 38.52 | 38.47 | 69.37 | 70.05 | 32.4 |
+---------------------------------------------+------+------+------+----------+---------+-----------+-----------+------------+
| Magpie-Align/Llama-3-8B-Magpie-Align-v0.2 | 7.81 | 7.64 | 7.73 | 49.86 | 51.98 | 75.17 | 78.20 | 37.5 |
+---------------------------------------------+------+------+------+----------+---------+-----------+-----------+------------+

👀 Other Information

License: Please follow Meta Llama 3 Community License.

Conversation Template: Please use Llama 3 official chat template for the best performance.

How to use it? Please check the official Llama 3 repository for detailed instructions. Simply replace the original model_id with Magpie-Align/Llama-3-8B-Magpie-Align-v0.1.

The detailed training pipeline is as follows.

Stage 1: Supervised Fine-tuning

We use Axolotl for SFT.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • total_eval_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss
0.8807 0.0007 1 0.9001
0.5113 0.3337 464 0.5178
0.4668 0.6673 928 0.4792
0.4492 1.0010 1392 0.4582
0.3498 1.3205 1856 0.4575
0.3525 1.6542 2320 0.4555

Framework versions

  • Transformers 4.40.2
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1

👁 Built with Axolotl

Stage 2: Direct Preference Optimization

We use alignment handbook for DPO.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.628 0.2138 100 0.6641 -0.8806 -1.0146 0.6240 0.1340 -362.7133 -343.6060 -0.7539 -0.7528
0.6935 0.4275 200 0.6352 -1.3660 -1.6311 0.6545 0.2651 -424.3628 -392.1437 -0.6649 -0.6629
0.6376 0.6413 300 0.6178 -1.3533 -1.6413 0.6748 0.2880 -425.3859 -390.8818 -0.6753 -0.6758
0.5888 0.8550 400 0.6088 -1.6321 -1.9785 0.6829 0.3464 -459.1051 -418.7560 -0.6440 -0.6435

It achieves the following results on the evaluation set:

  • Loss: 0.6084
  • Rewards/chosen: -1.6265
  • Rewards/rejected: -1.9735
  • Rewards/accuracies: 0.6809
  • Rewards/margins: 0.3470
  • Logps/rejected: -458.6070
  • Logps/chosen: -418.2021
  • Logits/rejected: -0.6447
  • Logits/chosen: -0.6439

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.3.1+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1

Downstream Performance

Datasets Llama-3-8B-Magpie-Align-v0.1
MMLU (5) 64.61
ARC (25) 62.03
HellaSwag (25) 82.10
TruthfulQA (0) 58.26
Winogrande (5) 73.01

Paper Abstract

📚 Citation

If you find the model, data, or code useful, please cite our paper:

@article{xu2024magpie,
 title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
 author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
 year={2024},
 eprint={2406.08464},
 archivePrefix={arXiv},
 primaryClass={cs.CL}
}

Please also cite the creators of preference datasets:

SimPO paper:

@article{meng2024simpo,
 title={{SimPO}: Simple preference optimization with a reference-free reward},
 author={Meng, Yu and Xia, Mengzhou and Chen, Danqi},
 journal={arXiv preprint arXiv:2405.14734},
 year={2024}
}

UltraFeedback paper:

@article{cui2023ultrafeedback,
 title={{UltraFeedback}: Boosting language models with high-quality feedback},
 author={Cui, Ganqu and Yuan, Lifan and Ding, Ning and Yao, Guanming and Zhu, Wei and Ni, Yuan and Xie, Guotong and Liu, Zhiyuan and Sun, Maosong},
 journal={arXiv preprint arXiv:2310.01377},
 year={2023}
}

ArmoRM paper:

@article{wang2024interpretable,
 title={Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts},
 author={Wang, Haoxiang and Xiong, Wei and Xie, Tengyang and Zhao, Han and Zhang, Tong},
 journal={arXiv preprint arXiv:2406.12845},
 year={2024}
}

Questions? Please contact Zhangchen by email.

Downloads last month
14
Safetensors
Model size
8B params
Tensor type
BF16
·

Model tree for Magpie-Align/Llama-3-8B-Magpie-Align-v0.1

Finetuned
(1)
this model
Quantizations
5 models

Datasets used to train Magpie-Align/Llama-3-8B-Magpie-Align-v0.1

Spaces using Magpie-Align/Llama-3-8B-Magpie-Align-v0.1 10

Collection including Magpie-Align/Llama-3-8B-Magpie-Align-v0.1

Papers for Magpie-Align/Llama-3-8B-Magpie-Align-v0.1