VOOZH about

URL: https://huggingface.co/elinas/Meta-Llama-3-120B-Instruct-4.0bpw-exl2

โ‡ฑ elinas/Meta-Llama-3-120B-Instruct-4.0bpw-exl2 ยท Hugging Face


๐Ÿ‘ image/jpeg

Meta-Llama-3-120B-Instruct

Meta-Llama-3-120B-Instruct is a self-merge with meta-llama/Meta-Llama-3-70B-Instruct.

It was inspired by large merges like:

๐Ÿ” Applications

I recommend using this model for creative writing. It uses the Llama 3 chat template with a default context window of 8K (can be extended with rope theta).

Check the examples in the evaluation section to get an idea of its performance.

โšก Quantized models

Thanks to Eric Hartford, elinas, and the mlx-community for providing these models.

๐Ÿ† Evaluation

The model looks excellent for creating writing tasks, outperforming GPT-4. Thanks again to Eric Hartford for noticing this.

๐Ÿงฉ Configuration

slices:
- sources:
 - layer_range: [0, 20]
 model: meta-llama/Meta-Llama-3-70B-Instruct
- sources:
 - layer_range: [10, 30]
 model: meta-llama/Meta-Llama-3-70B-Instruct
- sources:
 - layer_range: [20, 40]
 model: meta-llama/Meta-Llama-3-70B-Instruct
- sources:
 - layer_range: [30, 50]
 model: meta-llama/Meta-Llama-3-70B-Instruct
- sources:
 - layer_range: [40, 60]
 model: meta-llama/Meta-Llama-3-70B-Instruct
- sources:
 - layer_range: [50, 70]
 model: meta-llama/Meta-Llama-3-70B-Instruct
- sources:
 - layer_range: [60, 80]
 model: meta-llama/Meta-Llama-3-70B-Instruct
merge_method: passthrough
dtype: float16

๐Ÿ’ป Usage

!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "mlabonne/Llama-3-120B"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
 "text-generation",
 model=model,
 torch_dtype=torch.float16,
 device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
Downloads last month
6

Model tree for elinas/Meta-Llama-3-120B-Instruct-4.0bpw-exl2

Finetuned
(48)
this model