Hy-MT2-1.8B-kjh-ru-lora

A Khakas ↔ Russian machine translation model created by LoRA fine-tuning tencent/Hy-MT2-1.8B on a Khakas–Russian parallel corpus.

Khakas (Хакас тілі) is a low-resource Turkic language spoken in the Republic of Khakassia, Russia.

Quick Start

Requirements: transformers>=5.6.0, torch

from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed
import torch

model_name= "adeshkin/Hy-MT2-1.8B-kjh-ru-lora"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
 model_name,
 dtype=torch.bfloat16,
 device_map="auto",
 trust_remote_code=True,
)

src_lang = "Khakas"
tgt_lang = "Russian"
text = '54. "Ат ӱгредерде арғамҷың пик ползын, чонға чоохтирда чооғың сын ползын" сӧспектің тузазын чарыда пас пиріңер.'
prompt = f"Translate the following {src_lang} text into {tgt_lang}, output only the translation result without additional explanation:\n\n{text}"

messages = [{"role": "user", "content": prompt}]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)

model.eval()
with torch.no_grad():
 set_seed(19)
 outputs = model.generate(
 **inputs,
 max_new_tokens=4096,
 )

result = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
print(result)
# 54. Расскажите подробно значение пословицы "Чтобы выучить коня, у тебя должна быть крепкая веревка, чтобы говорить народу - твоя речь была правдивой."

To translate in the opposite direction, set src_lang = "Russian" and tgt_lang = "Khakas".

Base Model

Hy-MT2-1.8B is a decoder-only multilingual translation model developed by Tencent. It belongs to the Hy-MT2 family and supports translation among 33 languages.

Fine-Tuning

The model was fine-tuned using LoRA (Low-Rank Adaptation) applied to the attention projections (q, k, v, o). The LoRA weights were merged back into the base model for inference.

Training Hyperparameters

LoRA rank: 64
LoRA alpha: 128
LoRA dropout: 0.05
Max sequence length: 4096
Batch size: 2 (per device) with 16 gradient accumulation steps
Learning rate: 2e-4
LR scheduler: Cosine with minimum LR (1e-5)
Warmup ratio: 0.01
Max steps: 30,000
Precision: bf16
Hardware: 1x NVIDIA RTX 4060 Ti (8GB VRAM)
Training time: ~12 hours

For full training details and scripts, see the khakas-mt repository.

Training Data

The training corpus consists of ~160k parallel sentence pairs (~320k training examples after creating both translation directions: kjh→ru and ru→kjh). Data is formatted as instruction-following chat messages (JSONL) and shuffled.

Source	Pairs	Link
Khakas–Russian Parallel Corpus	159,213	adeshkin/khakas-russian-parallel-corpus
Google SmolSent	863	adeshkin/google-smol-en-ru-kjh (smolsent)
Google SmolDoc	825	adeshkin/google-smol-en-ru-kjh (smoldoc)

Evaluation

Evaluated on the FLORES+ devtest split (1,012 sentence pairs) using SacreBLEU:

Direction	BLEU	chrF++
kjh → ru	21.09	46.18
ru → kjh	16.82	48.86

FLORES+ dev split (997 sentences) was used for validation during training.

License

This model is distributed under the Apache 2.0 License, as the base model is licensed under the same terms.

Downloads last month: 180

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for adeshkin/Hy-MT2-1.8B-kjh-ru-lora

Base model

tencent/Hy-MT2-1.8B

Finetuned

(5)

this model

Datasets used to train adeshkin/Hy-MT2-1.8B-kjh-ru-lora

Collection including adeshkin/Hy-MT2-1.8B-kjh-ru-lora

6 items • Updated 20 days ago

URL: https://huggingface.co/adeshkin/Hy-MT2-1.8B-kjh-ru-lora

⇱ adeshkin/Hy-MT2-1.8B-kjh-ru-lora · Hugging Face