Hy-MT2-1.8B-kjh-ru-lora
A Khakas ↔ Russian machine translation model created by LoRA fine-tuning tencent/Hy-MT2-1.8B on a Khakas–Russian parallel corpus.
Khakas (Хакас тілі) is a low-resource Turkic language spoken in the Republic of Khakassia, Russia.
Quick Start
Requirements: transformers>=5.6.0, torch
from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed
import torch
model_name= "adeshkin/Hy-MT2-1.8B-kjh-ru-lora"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
src_lang = "Khakas"
tgt_lang = "Russian"
text = '54. "Ат ӱгредерде арғамҷың пик ползын, чонға чоохтирда чооғың сын ползын" сӧспектің тузазын чарыда пас пиріңер.'
prompt = f"Translate the following {src_lang} text into {tgt_lang}, output only the translation result without additional explanation:\n\n{text}"
messages = [{"role": "user", "content": prompt}]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
model.eval()
with torch.no_grad():
set_seed(19)
outputs = model.generate(
**inputs,
max_new_tokens=4096,
)
result = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
print(result)
# 54. Расскажите подробно значение пословицы "Чтобы выучить коня, у тебя должна быть крепкая веревка, чтобы говорить народу - твоя речь была правдивой."
To translate in the opposite direction, set src_lang = "Russian" and tgt_lang = "Khakas".
Base Model
Hy-MT2-1.8B is a decoder-only multilingual translation model developed by Tencent. It belongs to the Hy-MT2 family and supports translation among 33 languages.
Fine-Tuning
The model was fine-tuned using LoRA (Low-Rank Adaptation) applied to the attention projections (q, k, v, o). The LoRA weights were merged back into the base model for inference.
Training Hyperparameters
- LoRA rank: 64
- LoRA alpha: 128
- LoRA dropout: 0.05
- Max sequence length: 4096
- Batch size: 2 (per device) with 16 gradient accumulation steps
- Learning rate: 2e-4
- LR scheduler: Cosine with minimum LR (1e-5)
- Warmup ratio: 0.01
- Max steps: 30,000
- Precision: bf16
- Hardware: 1x NVIDIA RTX 4060 Ti (8GB VRAM)
- Training time: ~12 hours
For full training details and scripts, see the khakas-mt repository.
Training Data
The training corpus consists of ~160k parallel sentence pairs (~320k training examples after creating both translation directions: kjh→ru and ru→kjh). Data is formatted as instruction-following chat messages (JSONL) and shuffled.
| Source | Pairs | Link |
|---|---|---|
| Khakas–Russian Parallel Corpus | 159,213 | adeshkin/khakas-russian-parallel-corpus |
| Google SmolSent | 863 | adeshkin/google-smol-en-ru-kjh (smolsent) |
| Google SmolDoc | 825 | adeshkin/google-smol-en-ru-kjh (smoldoc) |
Evaluation
Evaluated on the FLORES+ devtest split (1,012 sentence pairs) using SacreBLEU:
| Direction | BLEU | chrF++ |
|---|---|---|
| kjh → ru | 21.09 | 46.18 |
| ru → kjh | 16.82 | 48.86 |
FLORES+ dev split (997 sentences) was used for validation during training.
License
This model is distributed under the Apache 2.0 License, as the base model is licensed under the same terms.
- Downloads last month
- 180
Model tree for adeshkin/Hy-MT2-1.8B-kjh-ru-lora
Base model
tencent/Hy-MT2-1.8B