🚀 [May 9, 2024] We're excited to introduce Llama3-70B-Chinese-Chat! Full-parameter fine-tuned on a mixed Chinese-English dataset of ~100K preference pairs, its Chinese performance surpasses ChatGPT and matches GPT-4, as shown by C-Eval and CMMLU results. Llama3-70B-Chinese-Chat is much more powerful than Llama3-8B-Chinese-Chat. If you love our Llama3-8B-Chinese-Chat, you must have a try on our Llama3-70B-Chinese-Chat!

🌟 We included all instructions on how to download, use, and reproduce our various kinds of models at this GitHub repo. If you like our models, we would greatly appreciate it if you could star our Github repository. Additionally, please click "like" on our HuggingFace repositories. Thank you!

❗️❗️❗️NOTICE: The main branch contains the files for Llama3-8B-Chinese-Chat-v2.1. If you want to use our Llama3-8B-Chinese-Chat-v1, please refer to the v1 branch; if you want to use our Llama3-8B-Chinese-Chat-v2, please refer to the v2 branch.

❗️❗️❗️NOTICE: For optimal performance, we refrain from fine-tuning the model's identity. Thus, inquiries such as "Who are you" or "Who developed you" may yield random responses that are not necessarily accurate.

Updates

🚀🚀🚀 [May 6, 2024] We now introduce Llama3-8B-Chinese-Chat-v2.1! Compared to v1, the training dataset of v2.1 is 5x larger (~100K preference pairs), and it exhibits significant enhancements, especially in roleplay, function calling, and math capabilities! Compared to v2, v2.1 surpasses v2 in math and is less prone to including English words in Chinese responses. The training dataset of Llama3-8B-Chinese-Chat-v2.1 will be released soon. If you love our Llama3-8B-Chinese-Chat-v1 or v2, you won't want to miss out on Llama3-8B-Chinese-Chat-v2.1!
🔥 We provide an online interactive demo for Llama3-8B-Chinese-Chat-v2 here. Have fun with our latest model!
🔥 We provide the official Ollama model for the q4_0 GGUF version of Llama3-8B-Chinese-Chat-v2.1 at wangshenzhi/llama3-8b-chinese-chat-ollama-q4! Run the following command for quick use of this model: ollama run wangshenzhi/llama3-8b-chinese-chat-ollama-q4.
🔥 We provide the official Ollama model for the q8_0 GGUF version of Llama3-8B-Chinese-Chat-v2.1 at wangshenzhi/llama3-8b-chinese-chat-ollama-q8! Run the following command for quick use of this model: ollama run wangshenzhi/llama3-8b-chinese-chat-ollama-q8.
🔥 We provide the official Ollama model for the f16 GGUF version of Llama3-8B-Chinese-Chat-v2.1 at wangshenzhi/llama3-8b-chinese-chat-ollama-fp16! Run the following command for quick use of this model: ollama run wangshenzhi/llama3-8b-chinese-chat-ollama-fp16.
🔥 We provide the official q4_0 GGUF version of Llama3-8B-Chinese-Chat-v2.1 at https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat-GGUF-4bit!
🔥 We provide the official q8_0 GGUF version of Llama3-8B-Chinese-Chat-v2.1 at https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat-GGUF-8bit!
🔥 We provide the official f16 GGUF version of Llama3-8B-Chinese-Chat-v2.1 at https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat-GGUF-f16!

Model Summary

Llama3-8B-Chinese-Chat is an instruction-tuned language model for Chinese & English users with various abilities such as roleplaying & tool-using built upon the Meta-Llama-3-8B-Instruct model.

Developers: Shenzhi Wang*, Yaowei Zheng*, Guoyin Wang (in.ai), Shiji Song, Gao Huang. (*: Equal Contribution)

License: Llama-3 License
Base Model: Meta-Llama-3-8B-Instruct
Model Size: 8.03B
Context length: 8K

1. Introduction

This is the first model specifically fine-tuned for Chinese & English user through ORPO [1] based on the Meta-Llama-3-8B-Instruct model.

Compared to the original Meta-Llama-3-8B-Instruct model, our Llama3-8B-Chinese-Chat-v1 model significantly reduces the issues of "Chinese questions with English answers" and the mixing of Chinese and English in responses.

Compared to Llama3-8B-Chinese-Chat-v1, our Llama3-8B-Chinese-Chat-v2 model significantly increases the training data size (from 20K to 100K), which introduces great performance enhancement, especially in roleplay, tool using, and math.

[1] Hong, Jiwoo, Noah Lee, and James Thorne. "Reference-free Monolithic Preference Optimization with Odds Ratio." arXiv preprint arXiv:2403.07691 (2024).

Training framework: LLaMA-Factory.

Training details:

epochs: 2
learning rate: 3e-6
learning rate scheduler type: cosine
Warmup ratio: 0.1
cutoff len (i.e. context length): 8192
orpo beta (i.e. $\lambda$ in the ORPO paper): 0.05
global batch size: 128
fine-tuning type: full parameters
optimizer: paged_adamw_32bit

2. Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "shenzhi-wang/Llama3-8B-Chinese-Chat"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
 model_id, torch_dtype="auto", device_map="auto"
)

messages = [
 {"role": "user", "content": "写一首诗吧"},
]

input_ids = tokenizer.apply_chat_template(
 messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)

outputs = model.generate(
 input_ids,
 max_new_tokens=8192,
 do_sample=True,
 temperature=0.6,
 top_p=0.9,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

3. Examples

The following are some examples generated by Llama3-8B-Chinese-Chat-v2.1, including examples of role playing, function calling, math, RuoZhiBa (弱智吧), safety, writing, and coding, etc.

For the examples generated by Llama3-8B-Chinese-Chat-v1, please refer to this link.

For the examples generated by Llama3-8B-Chinese-Chat-v2, please refer to this link.

Citation

If our Llama3-8B-Chinese-Chat is helpful, please kindly cite as:

@misc {shenzhi_wang_2024,
 author = {Wang, Shenzhi and Zheng, Yaowei and Wang, Guoyin and Song, Shiji and Huang, Gao},
 title = { Llama3-8B-Chinese-Chat (Revision 6622a23) },
 year = 2024,
 url = { https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat },
 doi = { 10.57967/hf/2316 },
 publisher = { Hugging Face }
}