VOOZH about

URL: https://huggingface.co/adeshkin/Hy-MT2-1.8B-kjh-ru-lora

⇱ adeshkin/Hy-MT2-1.8B-kjh-ru-lora · Hugging Face


Hy-MT2-1.8B-kjh-ru-lora

A Khakas ↔ Russian machine translation model created by LoRA fine-tuning tencent/Hy-MT2-1.8B on a Khakas–Russian parallel corpus.

Khakas (Хакас тілі) is a low-resource Turkic language spoken in the Republic of Khakassia, Russia.

Quick Start

Requirements: transformers>=5.6.0, torch

from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed
import torch

model_name= "adeshkin/Hy-MT2-1.8B-kjh-ru-lora"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
 model_name,
 dtype=torch.bfloat16,
 device_map="auto",
 trust_remote_code=True,
)

src_lang = "Khakas"
tgt_lang = "Russian"
text = '54. "Ат ӱгредерде арғамҷың пик ползын, чонға чоохтирда чооғың сын ползын" сӧспектің тузазын чарыда пас пиріңер.'
prompt = f"Translate the following {src_lang} text into {tgt_lang}, output only the translation result without additional explanation:\n\n{text}"

messages = [{"role": "user", "content": prompt}]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)

model.eval()
with torch.no_grad():
 set_seed(19)
 outputs = model.generate(
 **inputs,
 max_new_tokens=4096,
 )

result = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
print(result)
# 54. Расскажите подробно значение пословицы "Чтобы выучить коня, у тебя должна быть крепкая веревка, чтобы говорить народу - твоя речь была правдивой."

To translate in the opposite direction, set src_lang = "Russian" and tgt_lang = "Khakas".

Base Model

Hy-MT2-1.8B is a decoder-only multilingual translation model developed by Tencent. It belongs to the Hy-MT2 family and supports translation among 33 languages.

Fine-Tuning

The model was fine-tuned using LoRA (Low-Rank Adaptation) applied to the attention projections (q, k, v, o). The LoRA weights were merged back into the base model for inference.

Training Hyperparameters

  • LoRA rank: 64
  • LoRA alpha: 128
  • LoRA dropout: 0.05
  • Max sequence length: 4096
  • Batch size: 2 (per device) with 16 gradient accumulation steps
  • Learning rate: 2e-4
  • LR scheduler: Cosine with minimum LR (1e-5)
  • Warmup ratio: 0.01
  • Max steps: 30,000
  • Precision: bf16
  • Hardware: 1x NVIDIA RTX 4060 Ti (8GB VRAM)
  • Training time: ~12 hours

For full training details and scripts, see the khakas-mt repository.

Training Data

The training corpus consists of ~160k parallel sentence pairs (~320k training examples after creating both translation directions: kjh→ru and ru→kjh). Data is formatted as instruction-following chat messages (JSONL) and shuffled.

Source Pairs Link
Khakas–Russian Parallel Corpus 159,213 adeshkin/khakas-russian-parallel-corpus
Google SmolSent 863 adeshkin/google-smol-en-ru-kjh (smolsent)
Google SmolDoc 825 adeshkin/google-smol-en-ru-kjh (smoldoc)

Evaluation

Evaluated on the FLORES+ devtest split (1,012 sentence pairs) using SacreBLEU:

Direction BLEU chrF++
kjh → ru 21.09 46.18
ru → kjh 16.82 48.86

FLORES+ dev split (997 sentences) was used for validation during training.

License

This model is distributed under the Apache 2.0 License, as the base model is licensed under the same terms.

Downloads last month
180
Safetensors
Model size
2B params
Tensor type
BF16
·

Model tree for adeshkin/Hy-MT2-1.8B-kjh-ru-lora

Finetuned
(5)
this model

Datasets used to train adeshkin/Hy-MT2-1.8B-kjh-ru-lora

Collection including adeshkin/Hy-MT2-1.8B-kjh-ru-lora