Update README.md

bebae99 verified over 2 years ago

9.1 kB

language:
 - en
license: cc-by-nc-4.0
tags:
 - merge
 - lazymergekit
 - dpo
 - rlhf
dataset:
 - mlabonne/truthy-dpo-v0.1
 - mlabonne/distilabel-intel-orca-dpo-pairs
base_model:
 - mlabonne/Monarch-7B
model-index:
 - name: NeuralMonarch-7B
 results:
 - task:
 type: text-generation
 name: Text Generation
 dataset:
 name: AI2 Reasoning Challenge (25-Shot)
 type: ai2_arc
 config: ARC-Challenge
 split: test
 args:
 num_few_shot: 25
 metrics:
 - type: acc_norm
 value: 73.21
 name: normalized accuracy
 source:
 url: >-
 https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralMonarch-7B
 name: Open LLM Leaderboard
 - task:
 type: text-generation
 name: Text Generation
 dataset:
 name: HellaSwag (10-Shot)
 type: hellaswag
 split: validation
 args:
 num_few_shot: 10
 metrics:
 - type: acc_norm
 value: 89.09
 name: normalized accuracy
 source:
 url: >-
 https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralMonarch-7B
 name: Open LLM Leaderboard
 - task:
 type: text-generation
 name: Text Generation
 dataset:
 name: MMLU (5-Shot)
 type: cais/mmlu
 config: all
 split: test
 args:
 num_few_shot: 5
 metrics:
 - type: acc
 value: 64.41
 name: accuracy
 source:
 url: >-
 https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralMonarch-7B
 name: Open LLM Leaderboard
 - task:
 type: text-generation
 name: Text Generation
 dataset:
 name: TruthfulQA (0-shot)
 type: truthful_qa
 config: multiple_choice
 split: validation
 args:
 num_few_shot: 0
 metrics:
 - type: mc2
 value: 77.79
 source:
 url: >-
 https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralMonarch-7B
 name: Open LLM Leaderboard
 - task:
 type: text-generation
 name: Text Generation
 dataset:
 name: Winogrande (5-shot)
 type: winogrande
 config: winogrande_xl
 split: validation
 args:
 num_few_shot: 5
 metrics:
 - type: acc
 value: 84.61
 name: accuracy
 source:
 url: >-
 https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralMonarch-7B
 name: Open LLM Leaderboard
 - task:
 type: text-generation
 name: Text Generation
 dataset:
 name: GSM8k (5-shot)
 type: gsm8k
 config: main
 split: test
 args:
 num_few_shot: 5
 metrics:
 - type: acc
 value: 67.78
 name: accuracy
 source:
 url: >-
 https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralMonarch-7B
 name: Open LLM Leaderboard

👁 image/jpeg

👑 NeuralMonarch-7B

NeuralMonarch-7B is a DPO fine-tuned of mlabonne/Monarch-7B using the jondurbin/truthy-dpo-v0.1 and argilla/distilabel-intel-orca-dpo-pairs preference datasets.

It is based on a merge of the following models using LazyMergekit:

Special thanks to Jon Durbin, Intel, and Argilla for the preference datasets.

Try the demo: https://huggingface.co/spaces/mlabonne/NeuralMonarch-7B-GGUF-Chat

🔍 Applications

This model uses a context window of 8k. I recommend using it with the Mistral Instruct chat template (works perfectly with LM Studio).

Compared to other 7B models, it performs well in instruction following and reasoning tasks. For a chat/RP model with strong reasoning abilities, check out mlabonne/AlphaMonarch-7B.

⚡ Quantized models

GGUF: https://huggingface.co/mlabonne/NeuralMonarch-7B-GGUF

🏆 Evaluation

Nous

NeuralMonarch-7B is one of the best-performing 7B models on Nous' benchmark suite (evaluation performed using LLM AutoEval). See the entire leaderboard here.

Model	Average	AGIEval	GPT4All	TruthfulQA	Bigbench
NeuralMonarch-7B 📄	62.73	45.31	76.99	78.35	50.28
AlphaMonarch-7B 📄	62.74	45.37	77.01	78.39	50.2
Monarch-7B 📄	62.68	45.48	77.07	78.04	50.14
teknium/OpenHermes-2.5-Mistral-7B 📄	52.42	42.75	72.99	52.99	40.94
mlabonne/NeuralHermes-2.5-Mistral-7B 📄	53.51	43.67	73.24	55.37	41.76
mlabonne/NeuralBeagle14-7B 📄	60.25	46.06	76.77	70.32	47.86
mlabonne/NeuralOmniBeagle-7B 📄	62.3	45.85	77.26	76.06	50.03
eren23/dpo-binarized-NeuralTrix-7B 📄	62.5	44.57	76.34	79.81	49.27
CultriX/NeuralTrix-7B-dpo 📄	62.5	44.61	76.33	79.8	49.24

EQ-bench

NeuralMonarch-7B is also outperforming 70B and 120B parameter models on EQ-bench by Samuel J. Paech, who kindly ran the evaluations.

👁 image/png

Open LLM Leaderboard

NeuralMonarch-7B is one of the best-performing 7B models on the Open LLM Leaderboard.

MT-Bench

########## First turn ##########
 score
model turn 
gpt-4 1 8.95625
OmniBeagle-7B 1 8.31250
AlphaMonarch-7B 1 8.23750
claude-v1 1 8.15000
NeuralMonarch-7B 1 8.09375
gpt-3.5-turbo 1 8.07500
claude-instant-v1 1 7.80000

########## Second turn ##########
 score
model turn 
gpt-4 2 9.025000
claude-instant-v1 2 8.012658
OmniBeagle-7B 2 7.837500
gpt-3.5-turbo 2 7.812500
claude-v1 2 7.650000
AlphaMonarch-7B 2 7.618750
NeuralMonarch-7B 2 7.375000

########## Average ##########
 score
model 
gpt-4 8.990625
OmniBeagle-7B 8.075000
gpt-3.5-turbo 7.943750
AlphaMonarch-7B 7.928125
claude-instant-v1 7.905660
claude-v1 7.900000
NeuralMonarch-7B 7.734375
NeuralBeagle14-7B 7.628125

💻 Usage

!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "mlabonne/NeuralMonarch-7B"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
 "text-generation",
 model=model,
 torch_dtype=torch.float16,
 device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])

URL: https://huggingface.co/mlabonne/NeuralMonarch-7B/blob/main/README.md

⇱ README.md · mlabonne/NeuralMonarch-7B at main