VOOZH about

URL: https://huggingface.co/mlabonne/NeuralMonarch-7B/blob/main/README.md

โ‡ฑ README.md ยท mlabonne/NeuralMonarch-7B at main


๐Ÿ‘ mlabonne's picture
Update README.md
bebae99 verified
metadata
language:
 - en
license: cc-by-nc-4.0
tags:
 - merge
 - lazymergekit
 - dpo
 - rlhf
dataset:
 - mlabonne/truthy-dpo-v0.1
 - mlabonne/distilabel-intel-orca-dpo-pairs
base_model:
 - mlabonne/Monarch-7B
model-index:
 - name: NeuralMonarch-7B
 results:
 - task:
 type: text-generation
 name: Text Generation
 dataset:
 name: AI2 Reasoning Challenge (25-Shot)
 type: ai2_arc
 config: ARC-Challenge
 split: test
 args:
 num_few_shot: 25
 metrics:
 - type: acc_norm
 value: 73.21
 name: normalized accuracy
 source:
 url: >-
 https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralMonarch-7B
 name: Open LLM Leaderboard
 - task:
 type: text-generation
 name: Text Generation
 dataset:
 name: HellaSwag (10-Shot)
 type: hellaswag
 split: validation
 args:
 num_few_shot: 10
 metrics:
 - type: acc_norm
 value: 89.09
 name: normalized accuracy
 source:
 url: >-
 https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralMonarch-7B
 name: Open LLM Leaderboard
 - task:
 type: text-generation
 name: Text Generation
 dataset:
 name: MMLU (5-Shot)
 type: cais/mmlu
 config: all
 split: test
 args:
 num_few_shot: 5
 metrics:
 - type: acc
 value: 64.41
 name: accuracy
 source:
 url: >-
 https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralMonarch-7B
 name: Open LLM Leaderboard
 - task:
 type: text-generation
 name: Text Generation
 dataset:
 name: TruthfulQA (0-shot)
 type: truthful_qa
 config: multiple_choice
 split: validation
 args:
 num_few_shot: 0
 metrics:
 - type: mc2
 value: 77.79
 source:
 url: >-
 https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralMonarch-7B
 name: Open LLM Leaderboard
 - task:
 type: text-generation
 name: Text Generation
 dataset:
 name: Winogrande (5-shot)
 type: winogrande
 config: winogrande_xl
 split: validation
 args:
 num_few_shot: 5
 metrics:
 - type: acc
 value: 84.61
 name: accuracy
 source:
 url: >-
 https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralMonarch-7B
 name: Open LLM Leaderboard
 - task:
 type: text-generation
 name: Text Generation
 dataset:
 name: GSM8k (5-shot)
 type: gsm8k
 config: main
 split: test
 args:
 num_few_shot: 5
 metrics:
 - type: acc
 value: 67.78
 name: accuracy
 source:
 url: >-
 https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralMonarch-7B
 name: Open LLM Leaderboard

๐Ÿ‘ image/jpeg

๐Ÿ‘‘ NeuralMonarch-7B

NeuralMonarch-7B is a DPO fine-tuned of mlabonne/Monarch-7B using the jondurbin/truthy-dpo-v0.1 and argilla/distilabel-intel-orca-dpo-pairs preference datasets.

It is based on a merge of the following models using LazyMergekit:

Special thanks to Jon Durbin, Intel, and Argilla for the preference datasets.

Try the demo: https://huggingface.co/spaces/mlabonne/NeuralMonarch-7B-GGUF-Chat

๐Ÿ” Applications

This model uses a context window of 8k. I recommend using it with the Mistral Instruct chat template (works perfectly with LM Studio).

Compared to other 7B models, it performs well in instruction following and reasoning tasks. For a chat/RP model with strong reasoning abilities, check out mlabonne/AlphaMonarch-7B.

โšก Quantized models

๐Ÿ† Evaluation

Nous

NeuralMonarch-7B is one of the best-performing 7B models on Nous' benchmark suite (evaluation performed using LLM AutoEval). See the entire leaderboard here.

Model Average AGIEval GPT4All TruthfulQA Bigbench
NeuralMonarch-7B ๐Ÿ“„ 62.73 45.31 76.99 78.35 50.28
AlphaMonarch-7B ๐Ÿ“„ 62.74 45.37 77.01 78.39 50.2
Monarch-7B ๐Ÿ“„ 62.68 45.48 77.07 78.04 50.14
teknium/OpenHermes-2.5-Mistral-7B ๐Ÿ“„ 52.42 42.75 72.99 52.99 40.94
mlabonne/NeuralHermes-2.5-Mistral-7B ๐Ÿ“„ 53.51 43.67 73.24 55.37 41.76
mlabonne/NeuralBeagle14-7B ๐Ÿ“„ 60.25 46.06 76.77 70.32 47.86
mlabonne/NeuralOmniBeagle-7B ๐Ÿ“„ 62.3 45.85 77.26 76.06 50.03
eren23/dpo-binarized-NeuralTrix-7B ๐Ÿ“„ 62.5 44.57 76.34 79.81 49.27
CultriX/NeuralTrix-7B-dpo ๐Ÿ“„ 62.5 44.61 76.33 79.8 49.24

EQ-bench

NeuralMonarch-7B is also outperforming 70B and 120B parameter models on EQ-bench by Samuel J. Paech, who kindly ran the evaluations.

๐Ÿ‘ image/png

Open LLM Leaderboard

NeuralMonarch-7B is one of the best-performing 7B models on the Open LLM Leaderboard.

MT-Bench

########## First turn ##########
 score
model turn 
gpt-4 1 8.95625
OmniBeagle-7B 1 8.31250
AlphaMonarch-7B 1 8.23750
claude-v1 1 8.15000
NeuralMonarch-7B 1 8.09375
gpt-3.5-turbo 1 8.07500
claude-instant-v1 1 7.80000

########## Second turn ##########
 score
model turn 
gpt-4 2 9.025000
claude-instant-v1 2 8.012658
OmniBeagle-7B 2 7.837500
gpt-3.5-turbo 2 7.812500
claude-v1 2 7.650000
AlphaMonarch-7B 2 7.618750
NeuralMonarch-7B 2 7.375000

########## Average ##########
 score
model 
gpt-4 8.990625
OmniBeagle-7B 8.075000
gpt-3.5-turbo 7.943750
AlphaMonarch-7B 7.928125
claude-instant-v1 7.905660
claude-v1 7.900000
NeuralMonarch-7B 7.734375
NeuralBeagle14-7B 7.628125

๐Ÿ’ป Usage

!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "mlabonne/NeuralMonarch-7B"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
 "text-generation",
 model=model,
 torch_dtype=torch.float16,
 device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])