VOOZH about

URL: https://huggingface.co/Magpie-Align/Llama-3-8B-Magpie-Align-v0.3

⇱ Magpie-Align/Llama-3-8B-Magpie-Align-v0.3 · Hugging Face


👁 Magpie

🔥 Chat with Magpie Here!

🐦 Llama-3-8B-Magpie-Align-v0.3

Project Web: https://magpie-align.github.io/

Online Model Demo: https://huggingface.co/spaces/flydust/Chat-with-Magpie

Arxiv Technical Report: https://arxiv.org/abs/2406.08464

Codes: https://github.com/magpie-align/magpie

🧐 About This Model

This model is an aligned version of meta-llama/Meta-Llama-3-8B. We apply the following pipeline:

We first perform SFT using:

We then perform DPO on the princeton-nlp/llama3-ultrafeedback-armorm dataset.

The overall performance is much better than the official Llama-3-8B-Instruct Model! Plus, it can answer Chinese queries frequently, thanks to our new Chinese instruction dataset!

  • Alpaca Eval 2 (vs GPT-4-Turbo-1106): 48.58 (LC), 50.36 (WR)
  • Alpaca Eval 2 (vs Llama-3-8B-Instruct): 73.65 (LC), 75.81 (WR)
  • Arena Hard: 42.2
  • WildBench WB-Score: 41.1
  • Zero-Eval GSM: 50.0

🔥 Model Performance

We compare our Llama-3-8B-Magpie-Align with official and other open-aligned LLMs that have been fine-tuned from base models and have publicly released their training datasets. The results are as follows:

+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
| Aligned Model ID | MT-Bench | | | Alpaca Eval 2 | | Alpaca Eval 2 | | Arena Hard |
| | | | | (GPT-4-Turbo-1106) | | (Llama-3-8B-Instruct) | | |
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
| | R1 | R2 | AVG | LC WR | WR | LC WR | WR | Score |
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
| meta-llama/Meta-Llama-3-8B-Instruct | 8.31 | 7.65 | 7.98 | 22.92 | 22.57 | 50 | 50 | 20.6 |
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
| princeton-nlp/Llama-3-Base-8B-SFT-DPO | 8.12 | 7.23 | 7.67 | 17.71 | 15.34 | 43.73 | 38.80 | 14.8 |
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
| NousResearch/Hermes-2-Pro-Llama-3-8B | 8.05 | 7.35 | 7.70 | 15.60 | 12.86 | 36.37 | 30.52 | 11.5 |
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
| allenai/llama-3-tulu-2-dpo-8b | 7.71 | 7.15 | 7.43 | 14.89 | 14.8 | 35.43 | 35.42 | 11.7 |
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
| cognitivecomputations/dolphin-2.9-llama3-8b | 7.97 | 6.98 | 7.47 | 12.50 | 8.79 | 32.67 | 22.80 | 8.2 |
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
| openchat/openchat-3.6-8b-20240522 | 7.83 | 7.23 | 7.53 | 17.70 | 12.53 | 41.30 | 30.79 | 6.7 |
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
| Magpie-Align/Llama-3-8B-Magpie-Align-v0.1 | 8.01 | 7.63 | 7.82 | 38.52 | 38.47 | 69.37 | 70.05 | 32.4 |
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
| Magpie-Align/Llama-3-8B-Magpie-Align-v0.2 | 7.81 | 7.64 | 7.73 | 49.86 | 51.98 | 75.17 | 78.20 | 37.5 |
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+
| Magpie-Align/Llama-3-8B-Magpie-Align-v0.3 | 7.82 | 7.51 | 7.67 | 48.58 | 50.36 | 73.65 | 75.81 | 42.2 |
+---------------------------------------------+----------+------+------+--------------------+-------+-----------------------+-------+------------+

👀 Other Information

License: Please follow Meta Llama 3 Community License.

Conversation Template: Please use Llama 3 official chat template for the best performance.

How to use it? Please check the official Llama 3 repository for detailed instructions. Simply replace the original model_id with Magpie-Align/Llama-3-8B-Magpie-Align-SFT-v1.0.

The detailed training pipeline is as follows.

Stage 1: Supervised Fine-tuning

We use Axolotl for SFT.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • total_eval_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 98
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss
0.8616 0.0019 1 0.8870
0.5554 0.2013 106 0.5568
0.5067 0.4027 212 0.5065
0.4728 0.6040 318 0.4865
0.4681 0.8054 424 0.4740
0.4563 1.0067 530 0.4662
0.4115 1.1944 636 0.4642
0.3993 1.3957 742 0.4620
0.4048 1.5971 848 0.4613
0.4167 1.7984 954 0.4611

Framework versions

  • Transformers 4.42.3
  • Pytorch 2.3.1+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1

Internal name for identification: Llama-3-8B-Magpie-Mix-RC. Please change the model name in the below Axolotl config.

👁 Built with Axolotl


Stage 2: Direct Preference Optimization

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 7e-07
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5813 0.2137 100 0.5238 -2.6816 -3.4539 0.7298 0.7723 -612.4234 -541.2933 -1.1244 -1.1082
0.5021 0.4275 200 0.4483 -3.4053 -4.4858 0.8024 1.0805 -715.6146 -613.6641 -1.1035 -1.0844
0.3802 0.6412 300 0.4069 -3.7974 -5.1705 0.8427 1.3731 -784.0882 -652.8716 -1.1310 -1.1105
0.3827 0.8549 400 0.3872 -4.3693 -5.9670 0.8710 1.5976 -863.7308 -710.0647 -1.1495 -1.1283

Framework versions

  • Transformers 4.42.3
  • Pytorch 2.3.1+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1

Downstream Performance (Lighteval)

Datasets Llama-3-8B-Magpie-Align-v0.3
MMLU (5) 65.69
ARC (25) 63.23
HellaSwag (25) 82.15
TruthfulQA (0) 60.97
Winogrande (5) 73.64

Paper Abstract

📚 Citation

If you find the model, data, or code useful, please cite our paper:

@article{xu2024magpie,
 title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
 author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
 year={2024},
 eprint={2406.08464},
 archivePrefix={arXiv},
 primaryClass={cs.CL}
}

Please also cite the creators of preference datasets:

SimPO paper:

@article{meng2024simpo,
 title={{SimPO}: Simple preference optimization with a reference-free reward},
 author={Meng, Yu and Xia, Mengzhou and Chen, Danqi},
 journal={arXiv preprint arXiv:2405.14734},
 year={2024}
}

UltraFeedback paper:

@article{cui2023ultrafeedback,
 title={{UltraFeedback}: Boosting language models with high-quality feedback},
 author={Cui, Ganqu and Yuan, Lifan and Ding, Ning and Yao, Guanming and Zhu, Wei and Ni, Yuan and Xie, Guotong and Liu, Zhiyuan and Sun, Maosong},
 journal={arXiv preprint arXiv:2310.01377},
 year={2023}
}

ArmoRM paper:

@article{wang2024interpretable,
 title={Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts},
 author={Wang, Haoxiang and Xiong, Wei and Xie, Tengyang and Zhao, Han and Zhang, Tong},
 journal={arXiv preprint arXiv:2406.12845},
 year={2024}
}

Questions? Please contact Zhangchen by email.

Downloads last month
7,141
Safetensors
Model size
8B params
Tensor type
BF16
·

Model tree for Magpie-Align/Llama-3-8B-Magpie-Align-v0.3

Finetuned
(1)
this model
Quantizations
1 model

Datasets used to train Magpie-Align/Llama-3-8B-Magpie-Align-v0.3

Spaces using Magpie-Align/Llama-3-8B-Magpie-Align-v0.3 10

Collection including Magpie-Align/Llama-3-8B-Magpie-Align-v0.3

Papers for Magpie-Align/Llama-3-8B-Magpie-Align-v0.3