Mistral-Nemo-NT-Ko-12B-sft
Description
Mistral-Nemo-NT-Ko-12B-sft is an instruction-tuned version of mistralai/Mistral-Nemo-Base-2407, fine-tuned across four languages: English, Korean, Chinese, and Japanese.
The primary goals of this model are language alignment, cross-lingual knowledge transfer and ChatML formatting. This is an intermediate version since preference optimization has not yet been applied.
Features
The base model supports a context length of 128K, while I fine-tuned this model with an 8K context size.
The model follows to the input language unless the user explicitly specifies an output language (If the language is set by a system role, it may be ignored).
Answer length tends to vary by language: English responses are generally longer than average, while Korean responses tend to be shorter. The behavior for Japanese and Chinese is still under observation.
Recommended temperature settings: 0.3 to 0.7.
Evaluation
LogicKor
| 모델 | 방법 | 추론 | 수학 | 글쓰기 | 코딩 | 이해 | 문법 | 싱글턴 | 멀티턴 | 총점 |
|---|---|---|---|---|---|---|---|---|---|---|
| Mistral-Nemo-NT-Ko-12B-sft | cot-1-shot | 7.36 | 6.57 | 8.71 | 8.57 | 9.57 | 6.43 | 7.81 | 7.93 | 7.87 |
| Mistral-Nemo-NT-Ko-12B-sft | 1-shot | 9.00 | 5.71 | 7.93 | 8.29 | 7.93 | 5.21 | 7.29 | 7.40 | 7.35 |
| Mistral Nemo | 1-shot | 5.00, | 6.50 | 6.86 | 8.07 | 7.64 | 8.43 | 7.60 | 6.57 | 7.08 |
| Mistral Nemo | cot-1-shot | 5.43, | 6.86 | 6.07 | 7.57 | 5.86 | 7.57 | 7.50 | 5.62 | 6.56 |
| Mistral-Nemo-NT-Ko-12B-sft | default | 6.00 | 4.93 | 5.43 | 7.14 | 9.71 | 4.00 | 6.45 | 5.95 | 6.20 |
| Mistral Nemo | default | 0.43, | 7.64 | 6.21 | 7.14 | 6.79 | 7.21 | 6.26 | 5.55 | 5.90 |
MT-Bench
| Model | First | Second | Average |
|---|---|---|---|
| Mistral-Nemo-NT-Ko-12B-sft | 8.39 | 7.99 | 8.19 |
* judge-model: GPT-4 |
Language-Confusion(Korean Only)
| Model | Monolingual-LPR | Monolingual-WPR | Crosslingual-LPR | Crosslingual-WPR |
|---|---|---|---|---|
| Mistral-Nemo-NT-Ko-12B-sft | 100.00% | 99.00% | 87.51% | 96.96% |
| Mistral-Nemo-Instruct-2407 | 90.72% | 93.18% | 46.75% | 92.84% |
| Meta-Llama-3.1-8B-Instruct | 99.00% | 96.97% | 91.45% | 93.01% |
| gemma-2-9b-it | 100.00% | 98.00% | 87.93% | 95.58% |
example:
<|im_start|>system
You are a helpful AI assistant.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
I trained Mistral-Nemo-NT-Ko-12B with various system prompt from dozens of dataset. You can chat with/without your system prompt.
Dataset
werty1248/multilingual-instruct-balanced
Training Details
- GPU: 8xA40
- epoch: 3
- total batch size: 8
- learning rate: 7e-6
- weight decay: 0.01
- Training loss
- Downloads last month
- 5
Model tree for werty1248/Mistral-Nemo-NT-Ko-12B-sft
Base model
mistralai/Mistral-Nemo-Base-2407