VOOZH about

URL: https://huggingface.co/Magpie-Align/Llama-3.1-8B-Magpie-Align-v0.2

โ‡ฑ Magpie-Align/Llama-3.1-8B-Magpie-Align-v0.2 ยท Hugging Face


๐Ÿ‘ Magpie

๐Ÿ”ฅ Chat with Magpie Here!

๐Ÿฆ Llama-3.1-8B-Magpie-Align-v0.2

Project Web: https://magpie-align.github.io/

Online Model Demo: https://huggingface.co/spaces/flydust/Chat-with-Magpie

Arxiv Technical Report: https://arxiv.org/abs/2406.08464

Codes: https://github.com/magpie-align/magpie

๐Ÿง About This Model

This model is an aligned version of meta-llama/Meta-Llama-3.1-8B. We apply the following pipeline:

We first perform SFT using:

We then perform DPO on the Magpie-Align/Llama-3.1-70B-PO-100K-armorm dataset.

The overall performance is much better than the official Llama-3.1-8B-Instruct Model!

  • Alpaca Eval 2 (vs GPT-4-Turbo-1106): 46.68 (LC), 53.42 (WR)
  • Arena Hard: 43.2

๐Ÿ‘€ Other Information

License: Please follow Meta Llama 3.1 Community License.

Conversation Template: Please use Llama 3 official chat template for the best performance.

How to use it? Please check the official Llama 3.1 repository for detailed instructions. Simply replace the original model_id with this model id.


Alignment Pipeline

The detailed alignment pipeline is as follows.

Stage 1: Supervised Fine-tuning

We use Axolotl for SFT. Please refer to the model card of SFT checkpoint for detailed configurations.

Stage 2: Direct Preference Optimization

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5603 0.1306 100 0.5762 -1.0828 -1.5526 0.7620 0.4698 -505.3885 -452.3145 -0.7241 -0.7285
0.5441 0.2612 200 0.4445 -3.4116 -5.1002 0.8360 1.6886 -860.1481 -685.1905 -0.6966 -0.6964
0.3586 0.3919 300 0.3949 -3.4100 -5.2798 0.8720 1.8698 -878.1118 -685.0309 -0.7677 -0.7653
0.3737 0.5225 400 0.3653 -4.3580 -6.6737 0.8760 2.3157 -1017.5 -779.8291 -0.7777 -0.7711
0.2611 0.6531 500 0.3457 -4.9017 -7.6712 0.8860 2.7695 -1117.2515 -834.2015 -0.8137 -0.8074
0.3342 0.7837 600 0.3354 -4.7041 -7.3342 0.8920 2.6301 -1083.5503 -814.4402 -0.8081 -0.7999
0.3251 0.9144 700 0.3335 -4.8366 -7.5394 0.8880 2.7028 -1104.0730 -827.6954 -0.8119 -0.8042

Framework versions

  • Transformers 4.43.3
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1

Paper Abstract

๐Ÿ“š Citation

If you find the model, data, or code useful, please cite our paper:

@article{xu2024magpie,
 title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
 author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
 year={2024},
 eprint={2406.08464},
 archivePrefix={arXiv},
 primaryClass={cs.CL}
}

Please also cite the reward model for creating preference datasets:

ArmoRM paper:

@article{wang2024interpretable,
 title={Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts},
 author={Wang, Haoxiang and Xiong, Wei and Xie, Tengyang and Zhao, Han and Zhang, Tong},
 journal={arXiv preprint arXiv:2406.12845},
 year={2024}
}

Questions? Please contact Zhangchen by email.

Downloads last month
8
Safetensors
Model size
8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Magpie-Align/Llama-3.1-8B-Magpie-Align-v0.2

Finetuned
(1)
this model
Quantizations
2 models

Dataset used to train Magpie-Align/Llama-3.1-8B-Magpie-Align-v0.2

Collection including Magpie-Align/Llama-3.1-8B-Magpie-Align-v0.2

Papers for Magpie-Align/Llama-3.1-8B-Magpie-Align-v0.2