language: - en license: cc-by-nc-4.0 tags: - merge - lazymergekit - dpo - rlhf dataset: - mlabonne/truthy-dpo-v0.1 - mlabonne/distilabel-intel-orca-dpo-pairs base_model: - mlabonne/Monarch-7B model-index: - name: NeuralMonarch-7B results: - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics: - type: acc_norm value: 73.21 name: normalized accuracy source: url: >- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralMonarch-7B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics: - type: acc_norm value: 89.09 name: normalized accuracy source: url: >- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralMonarch-7B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics: - type: acc value: 64.41 name: accuracy source: url: >- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralMonarch-7B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics: - type: mc2 value: 77.79 source: url: >- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralMonarch-7B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics: - type: acc value: 84.61 name: accuracy source: url: >- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralMonarch-7B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 67.78 name: accuracy source: url: >- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralMonarch-7B name: Open LLM Leaderboard
๐ NeuralMonarch-7B
NeuralMonarch-7B is a DPO fine-tuned of mlabonne/Monarch-7B using the jondurbin/truthy-dpo-v0.1 and argilla/distilabel-intel-orca-dpo-pairs preference datasets.
It is based on a merge of the following models using LazyMergekit:
Special thanks to Jon Durbin, Intel, and Argilla for the preference datasets.
Try the demo: https://huggingface.co/spaces/mlabonne/NeuralMonarch-7B-GGUF-Chat
๐ Applications
This model uses a context window of 8k. I recommend using it with the Mistral Instruct chat template (works perfectly with LM Studio).
Compared to other 7B models, it performs well in instruction following and reasoning tasks. For a chat/RP model with strong reasoning abilities, check out mlabonne/AlphaMonarch-7B.
โก Quantized models
๐ Evaluation
Nous
NeuralMonarch-7B is one of the best-performing 7B models on Nous' benchmark suite (evaluation performed using LLM AutoEval). See the entire leaderboard here.
| Model | Average | AGIEval | GPT4All | TruthfulQA | Bigbench |
|---|---|---|---|---|---|
| NeuralMonarch-7B ๐ | 62.73 | 45.31 | 76.99 | 78.35 | 50.28 |
| AlphaMonarch-7B ๐ | 62.74 | 45.37 | 77.01 | 78.39 | 50.2 |
| Monarch-7B ๐ | 62.68 | 45.48 | 77.07 | 78.04 | 50.14 |
| teknium/OpenHermes-2.5-Mistral-7B ๐ | 52.42 | 42.75 | 72.99 | 52.99 | 40.94 |
| mlabonne/NeuralHermes-2.5-Mistral-7B ๐ | 53.51 | 43.67 | 73.24 | 55.37 | 41.76 |
| mlabonne/NeuralBeagle14-7B ๐ | 60.25 | 46.06 | 76.77 | 70.32 | 47.86 |
| mlabonne/NeuralOmniBeagle-7B ๐ | 62.3 | 45.85 | 77.26 | 76.06 | 50.03 |
| eren23/dpo-binarized-NeuralTrix-7B ๐ | 62.5 | 44.57 | 76.34 | 79.81 | 49.27 |
| CultriX/NeuralTrix-7B-dpo ๐ | 62.5 | 44.61 | 76.33 | 79.8 | 49.24 |
EQ-bench
NeuralMonarch-7B is also outperforming 70B and 120B parameter models on EQ-bench by Samuel J. Paech, who kindly ran the evaluations.
Open LLM Leaderboard
NeuralMonarch-7B is one of the best-performing 7B models on the Open LLM Leaderboard.
MT-Bench
########## First turn ##########
score
model turn
gpt-4 1 8.95625
OmniBeagle-7B 1 8.31250
AlphaMonarch-7B 1 8.23750
claude-v1 1 8.15000
NeuralMonarch-7B 1 8.09375
gpt-3.5-turbo 1 8.07500
claude-instant-v1 1 7.80000
########## Second turn ##########
score
model turn
gpt-4 2 9.025000
claude-instant-v1 2 8.012658
OmniBeagle-7B 2 7.837500
gpt-3.5-turbo 2 7.812500
claude-v1 2 7.650000
AlphaMonarch-7B 2 7.618750
NeuralMonarch-7B 2 7.375000
########## Average ##########
score
model
gpt-4 8.990625
OmniBeagle-7B 8.075000
gpt-3.5-turbo 7.943750
AlphaMonarch-7B 7.928125
claude-instant-v1 7.905660
claude-v1 7.900000
NeuralMonarch-7B 7.734375
NeuralBeagle14-7B 7.628125
๐ป Usage
!pip install -qU transformers accelerate
from transformers import AutoTokenizer
import transformers
import torch
model = "mlabonne/NeuralMonarch-7B"
messages = [{"role": "user", "content": "What is a large language model?"}]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
