🔥 Chat with Magpie Here!

🐦 Llama-3.1-8B-Magpie-Align-v0.2

Project Web: https://magpie-align.github.io/

Online Model Demo: https://huggingface.co/spaces/flydust/Chat-with-Magpie

Arxiv Technical Report: https://arxiv.org/abs/2406.08464

Codes: https://github.com/magpie-align/magpie

🧐 About This Model

This model is an aligned version of meta-llama/Meta-Llama-3.1-8B. We apply the following pipeline:

We first perform SFT using:

Magpie-Align/Magpie-Llama-3.1-Pro-500K-Filtered
Magpie-Align/Magpie-Reasoning-150K
SFT Model Checkpoint: Magpie-Align/Llama-3.1-8B-Magpie-Align-SFT-v0.2

We then perform DPO on the Magpie-Align/Llama-3.1-70B-PO-100K-armorm dataset.

The overall performance is much better than the official Llama-3.1-8B-Instruct Model!

Alpaca Eval 2 (vs GPT-4-Turbo-1106): 46.68 (LC), 53.42 (WR)
Arena Hard: 43.2

👀 Other Information

License: Please follow Meta Llama 3.1 Community License.

Conversation Template: Please use Llama 3 official chat template for the best performance.

How to use it? Please check the official Llama 3.1 repository for detailed instructions. Simply replace the original model_id with this model id.

Alignment Pipeline

The detailed alignment pipeline is as follows.

Stage 1: Supervised Fine-tuning

We use Axolotl for SFT. Please refer to the model card of SFT checkpoint for detailed configurations.

Stage 2: Direct Preference Optimization

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 2
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 16
total_train_batch_size: 128
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.5603	0.1306	100	0.5762	-1.0828	-1.5526	0.7620	0.4698	-505.3885	-452.3145	-0.7241	-0.7285
0.5441	0.2612	200	0.4445	-3.4116	-5.1002	0.8360	1.6886	-860.1481	-685.1905	-0.6966	-0.6964
0.3586	0.3919	300	0.3949	-3.4100	-5.2798	0.8720	1.8698	-878.1118	-685.0309	-0.7677	-0.7653
0.3737	0.5225	400	0.3653	-4.3580	-6.6737	0.8760	2.3157	-1017.5	-779.8291	-0.7777	-0.7711
0.2611	0.6531	500	0.3457	-4.9017	-7.6712	0.8860	2.7695	-1117.2515	-834.2015	-0.8137	-0.8074
0.3342	0.7837	600	0.3354	-4.7041	-7.3342	0.8920	2.6301	-1083.5503	-814.4402	-0.8081	-0.7999
0.3251	0.9144	700	0.3335	-4.8366	-7.5394	0.8880	2.7028	-1104.0730	-827.6954	-0.8119	-0.8042

Framework versions

Transformers 4.43.3
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

Paper Abstract

📚 Citation

If you find the model, data, or code useful, please cite our paper:

@article{xu2024magpie,
 title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, 
 author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
 year={2024},
 eprint={2406.08464},
 archivePrefix={arXiv},
 primaryClass={cs.CL}
}

Please also cite the reward model for creating preference datasets:

ArmoRM paper:

@article{wang2024interpretable,
 title={Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts},
 author={Wang, Haoxiang and Xiong, Wei and Xie, Tengyang and Zhao, Han and Zhang, Tong},
 journal={arXiv preprint arXiv:2406.12845},
 year={2024}
}

Questions? Please contact Zhangchen by email.

Downloads last month: 8

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Magpie-Align/Llama-3.1-8B-Magpie-Align-v0.2

Base model

meta-llama/Llama-3.1-8B

Finetuned

Magpie-Align/Llama-3.1-8B-Magpie-Align-SFT-v0.2

Finetuned

(1)

this model

Quantizations

2 models

Dataset used to train Magpie-Align/Llama-3.1-8B-Magpie-Align-v0.2

Collection including Magpie-Align/Llama-3.1-8B-Magpie-Align-v0.2

Open-aligned models using Magpie datasets. • 11 items • Updated Jan 13, 2025 • 1

Papers for Magpie-Align/Llama-3.1-8B-Magpie-Align-v0.2

Paper • 2406.12845 • Published Jun 18, 2024 • 1

Paper • 2406.08464 • Published Jun 12, 2024 • 72

URL: https://huggingface.co/Magpie-Align/Llama-3.1-8B-Magpie-Align-v0.2

⇱ Magpie-Align/Llama-3.1-8B-Magpie-Align-v0.2 · Hugging Face