Paper • 1908.10084 • Published • 15
SentenceTransformer based on sentence-transformers/LaBSE
This is a sentence-transformers model finetuned from sentence-transformers/LaBSE on the khakas-russian-parallel-corpus dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for retrieval.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/LaBSE
- Maximum Sequence Length: 256 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Supported Modality: Text
- Training Dataset:
- Languages: kjh, ru
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'BertModel'})
(1): Pooling({'embedding_dimension': 768, 'pooling_mode': 'cls', 'include_prompt': True})
(2): Dense({'in_features': 768, 'out_features': 768, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh', 'module_input_name': 'sentence_embedding', 'module_output_name': 'sentence_embedding'})
(3): Normalize({})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("adeshkin/labse-kjh-ru-mnrl-1")
# Run inference
sentences = [
'Тӧреенде сағыпчатхан чуртас узуны ортымах чуртас узунын санирында тузаланылча.',
'Ожидаемая продолжительность жизни при рождении используется в качестве средней продолжительности жизни.',
'Уранча. Торгаях, ты зачем беспокоишь гостя нашими заботами?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000, 0.7541, -0.0902],
# [ 0.7541, 1.0000, -0.1342],
# [-0.0902, -0.1342, 1.0000]])
Evaluation
Metrics
Translation
- Dataset:
kjh-ru-random - Evaluated with
TranslationEvaluator
| Metric | Value |
|---|---|
| src2trg_accuracy | 0.9548 |
| trg2src_accuracy | 0.951 |
| mean_accuracy | 0.9529 |
Training Details
Training Dataset
khakas-russian-parallel-corpus
- Dataset: khakas-russian-parallel-corpus at 318e0f5
- Size: 157,620 training samples
- Columns:
kjhandru - Approximate statistics based on the first 1000 samples:
kjh ru type string string details - min: 6 tokens
- mean: 28.24 tokens
- max: 111 tokens
- min: 4 tokens
- mean: 20.11 tokens
- max: 84 tokens
- Samples:
kjh ru Тӧреенде сағыпчатхан чуртас узуны ортымах чуртас узунын санирында тузаланылча.Ожидаемая продолжительность жизни при рождении используется в качестве средней продолжительности жизни.ТЕЛЕФОН НОМЕРІ (пар полза) 11.НОМЕР ТЕЛЕФОНА (если имеется) 11.Танығлар піріктірілген полза, ол кӧзідіг прай тоғынчатхан танығларның хыраның узунының суммазынаң пиріл парған.Если признаки составные, данный показатель представлен суммой длины поля всех задействованных признаков. - Loss:
MultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim", "gather_across_devices": false, "directions": [ "query_to_doc" ], "partition_mode": "joint", "hardness_mode": null, "hardness_strength": 0.0 }
Evaluation Dataset
khakas-russian-parallel-corpus
- Dataset: khakas-russian-parallel-corpus at 318e0f5
- Size: 1,593 evaluation samples
- Columns:
kjhandru - Approximate statistics based on the first 1000 samples:
kjh ru type string string details - min: 7 tokens
- mean: 28.36 tokens
- max: 146 tokens
- min: 5 tokens
- mean: 20.47 tokens
- max: 97 tokens
- Samples:
kjh ru Чуртас тооза пос чирінің омазын чӱреенде ал чӧрген, хайда даа полза, Хакас чирінеңер кӧглеен.Всю жизнь носил в сердце образ своей земли, где бы ни был, воспевал Хакасскую землю.2 Пил палых сурча2 Таймень-рыба спрашивает:чолға сығар тим чит килгеннаступила пора собираться в путь - Loss:
MultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim", "gather_across_devices": false, "directions": [ "query_to_doc" ], "partition_mode": "joint", "hardness_mode": null, "hardness_strength": 0.0 }
Training Hyperparameters
Non-Default Hyperparameters
learning_rate: 2e-05num_train_epochs: 1warmup_steps: 1000fp16: True
All Hyperparameters
Training Logs
| Epoch | Step | Training Loss | Validation Loss | kjh-ru-random_mean_accuracy |
|---|---|---|---|---|
| 0.0254 | 500 | 0.1515 | 0.0636 | 0.7759 |
| 0.0508 | 1000 | 0.0661 | 0.0503 | 0.8267 |
| 0.0761 | 1500 | 0.0460 | 0.0285 | 0.8534 |
| 0.1015 | 2000 | 0.0402 | 0.0271 | 0.8628 |
| 0.1269 | 2500 | 0.0328 | 0.0240 | 0.8741 |
| 0.1523 | 3000 | 0.0253 | 0.0293 | 0.8851 |
| 0.1776 | 3500 | 0.0247 | 0.0231 | 0.8930 |
| 0.2030 | 4000 | 0.0285 | 0.0157 | 0.9090 |
| 0.2284 | 4500 | 0.0216 | 0.0153 | 0.9002 |
| 0.2538 | 5000 | 0.0172 | 0.0142 | 0.9171 |
| 0.2791 | 5500 | 0.0215 | 0.0170 | 0.8983 |
| 0.3045 | 6000 | 0.0172 | 0.0138 | 0.9187 |
| 0.3299 | 6500 | 0.0109 | 0.0162 | 0.9175 |
| 0.3553 | 7000 | 0.0146 | 0.0115 | 0.9253 |
| 0.3807 | 7500 | 0.0144 | 0.0149 | 0.9278 |
| 0.4060 | 8000 | 0.0116 | 0.0101 | 0.9347 |
| 0.4314 | 8500 | 0.0119 | 0.0142 | 0.9369 |
| 0.4568 | 9000 | 0.0196 | 0.0127 | 0.9382 |
| 0.4822 | 9500 | 0.0090 | 0.0120 | 0.9372 |
| 0.5075 | 10000 | 0.0123 | 0.0129 | 0.9438 |
| 0.5329 | 10500 | 0.0113 | 0.0086 | 0.9397 |
| 0.5583 | 11000 | 0.0091 | 0.0113 | 0.9435 |
| 0.5837 | 11500 | 0.0118 | 0.0104 | 0.9419 |
| 0.6090 | 12000 | 0.0076 | 0.0099 | 0.9429 |
| 0.6344 | 12500 | 0.0115 | 0.0081 | 0.9401 |
| 0.6598 | 13000 | 0.0074 | 0.0095 | 0.9466 |
| 0.6852 | 13500 | 0.0116 | 0.0090 | 0.9466 |
| 0.7106 | 14000 | 0.0107 | 0.0082 | 0.9520 |
| 0.7359 | 14500 | 0.0125 | 0.0068 | 0.9498 |
| 0.7613 | 15000 | 0.0109 | 0.0092 | 0.9507 |
| 0.7867 | 15500 | 0.0064 | 0.0069 | 0.9529 |
| 0.8121 | 16000 | 0.0077 | 0.0079 | 0.9532 |
| 0.8374 | 16500 | 0.0063 | 0.0067 | 0.9539 |
| 0.8628 | 17000 | 0.0072 | 0.0057 | 0.9539 |
| 0.8882 | 17500 | 0.0075 | 0.0060 | 0.9545 |
| 0.9136 | 18000 | 0.0098 | 0.0061 | 0.9539 |
| 0.9389 | 18500 | 0.0057 | 0.0058 | 0.9539 |
| 0.9643 | 19000 | 0.0079 | 0.0059 | 0.9526 |
| 0.9897 | 19500 | 0.0052 | 0.0058 | 0.9529 |
Training Time
- Training: 1.4 hours
Framework Versions
- Python: 3.12.13
- Sentence Transformers: 5.4.1
- Transformers: 5.0.0
- PyTorch: 2.10.0+cu128
- Accelerate: 1.13.0
- Datasets: 4.0.0
- Tokenizers: 0.22.2
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{oord2019representationlearningcontrastivepredictive,
title={Representation Learning with Contrastive Predictive Coding},
author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
year={2019},
eprint={1807.03748},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/1807.03748},
}
- Downloads last month
- 11
Safetensors
Model size
0.5B params
Tensor type
F32
·
Model tree for adeshkin/labse-kjh-ru-mnrl-1
Base model
sentence-transformers/LaBSEDataset used to train adeshkin/labse-kjh-ru-mnrl-1
Papers for adeshkin/labse-kjh-ru-mnrl-1
Evaluation results
- Src2Trg Accuracy on kjh ru randomself-reported0.955
- Trg2Src Accuracy on kjh ru randomself-reported0.951
- Mean Accuracy on kjh ru randomself-reported0.953
