Latest and greatest finetunes โข 10 items โข Updated โข 2
dfurman/Llama-3-8B-Orpo-v0.1
This is an ORPO fine-tune of meta-llama/Meta-Llama-3-8B on 4k samples of mlabonne/orpo-dpo-mix-40k.
It's a successful fine-tune that follows the ChatML template!
๐ Application
This model uses a context window of 8k. It was trained with the ChatML template.
๐ Evaluation
Open LLM Leaderboard
| Model ID | Average | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K |
|---|---|---|---|---|---|---|---|
| meta-llama/Meta-Llama-3-8B-Instruct ๐ | 66.87 | 60.75 | 78.55 | 67.07 | 51.65 | 74.51 | 68.69 |
| dfurman/Llama-3-8B-Orpo-v0.1 ๐ | 64.67 | 60.67 | 82.56 | 66.59 | 50.47 | 79.01 | 48.75 |
| meta-llama/Meta-Llama-3-8B ๐ | 62.35 | 59.22 | 82.02 | 66.49 | 43.95 | 77.11 | 45.34 |
๐ Training curves
You can find the experiment on W&B at this address.
๐ป Usage
Run
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Tell me a recipe for a spicy margarita."},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
print("***Prompt:\n", prompt)
outputs = pipeline(prompt, max_new_tokens=1000, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print("***Generation:\n", outputs[0]["generated_text"][len(prompt):])
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_dfurman__Llama-3-8B-Orpo-v0.1)
| Metric | Value |
|---|---|
| Avg. | 11.01 |
| IFEval (0-Shot) | 30.00 |
| BBH (3-Shot) | 13.77 |
| MATH Lvl 5 (4-Shot) | 3.78 |
| GPQA (0-shot) | 1.57 |
| MuSR (0-shot) | 2.73 |
| MMLU-PRO (5-shot) | 14.23 |
- Downloads last month
- 7,197
Safetensors
Model size
8B params
Tensor type
F16
ยท
Model tree for dfurman/Llama-3-8B-Orpo-v0.1
Dataset used to train dfurman/Llama-3-8B-Orpo-v0.1
Space using dfurman/Llama-3-8B-Orpo-v0.1 1
Collection including dfurman/Llama-3-8B-Orpo-v0.1
Evaluation results
- strict accuracy on IFEval (0-Shot)Open LLM Leaderboard30.000
- normalized accuracy on BBH (3-Shot)Open LLM Leaderboard13.770
- exact match on MATH Lvl 5 (4-Shot)Open LLM Leaderboard3.780
- acc_norm on GPQA (0-shot)Open LLM Leaderboard1.570
- acc_norm on MuSR (0-shot)Open LLM Leaderboard2.730
- accuracy on MMLU-PRO (5-shot)test set Open LLM Leaderboard14.230
