Mistral-Nemo-NT-Ko-12B-sft

Description

Mistral-Nemo-NT-Ko-12B-sft is an instruction-tuned version of mistralai/Mistral-Nemo-Base-2407, fine-tuned across four languages: English, Korean, Chinese, and Japanese.

The primary goals of this model are language alignment, cross-lingual knowledge transfer and ChatML formatting. This is an intermediate version since preference optimization has not yet been applied.

Features

The base model supports a context length of 128K, while I fine-tuned this model with an 8K context size.
The model follows to the input language unless the user explicitly specifies an output language (If the language is set by a system role, it may be ignored).
Answer length tends to vary by language: English responses are generally longer than average, while Korean responses tend to be shorter. The behavior for Japanese and Chinese is still under observation.
Recommended temperature settings: 0.3 to 0.7.

Evaluation

LogicKor

모델	방법	추론	수학	글쓰기	코딩	이해	문법	싱글턴	멀티턴	총점
Mistral-Nemo-NT-Ko-12B-sft	cot-1-shot	7.36	6.57	8.71	8.57	9.57	6.43	7.81	7.93	7.87
Mistral-Nemo-NT-Ko-12B-sft	1-shot	9.00	5.71	7.93	8.29	7.93	5.21	7.29	7.40	7.35
Mistral Nemo	1-shot	5.00,	6.50	6.86	8.07	7.64	8.43	7.60	6.57	7.08
Mistral Nemo	cot-1-shot	5.43,	6.86	6.07	7.57	5.86	7.57	7.50	5.62	6.56
Mistral-Nemo-NT-Ko-12B-sft	default	6.00	4.93	5.43	7.14	9.71	4.00	6.45	5.95	6.20
Mistral Nemo	default	0.43,	7.64	6.21	7.14	6.79	7.21	6.26	5.55	5.90

MT-Bench

Model	First	Second	Average
Mistral-Nemo-NT-Ko-12B-sft	8.39	7.99	8.19
* `judge-model: GPT-4`

Language-Confusion(Korean Only)

Model	Monolingual-LPR	Monolingual-WPR	Crosslingual-LPR	Crosslingual-WPR
Mistral-Nemo-NT-Ko-12B-sft	100.00%	99.00%	87.51%	96.96%
Mistral-Nemo-Instruct-2407	90.72%	93.18%	46.75%	92.84%
Meta-Llama-3.1-8B-Instruct	99.00%	96.97%	91.45%	93.01%
gemma-2-9b-it	100.00%	98.00%	87.93%	95.58%

example:

<|im_start|>system
You are a helpful AI assistant.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

I trained Mistral-Nemo-NT-Ko-12B with various system prompt from dozens of dataset. You can chat with/without your system prompt.

Dataset

werty1248/multilingual-instruct-balanced

Training Details

GPU: 8xA40
epoch: 3
total batch size: 8
learning rate: 7e-6
weight decay: 0.01

👁 Built with Axolotl

Training loss

👁 image/png

Downloads last month: 5

Safetensors

Model size

12B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for werty1248/Mistral-Nemo-NT-Ko-12B-sft

Base model

mistralai/Mistral-Nemo-Base-2407

Finetuned

(93)

this model

Finetunes

1 model

Quantizations

2 models

URL: https://huggingface.co/werty1248/Mistral-Nemo-NT-Ko-12B-sft

⇱ werty1248/Mistral-Nemo-NT-Ko-12B-sft · Hugging Face