Paper • 2101.06983 • Published • 2
ModernBERT-small for General Purpose Similarity
This is a sentence-transformers model trained on the nli, quora, natural_questions, stsb, sentence_compression, simple_wiki, altlex, coco_captions, flickr30k_captions, yahoo_answers and stack_exchange datasets. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
This model is based on the wide architecture of johnnyboycurtis/ModernBERT-small
small_modernbert_config = ModernBertConfig(
hidden_size=384, # A common dimension for small embedding models
num_hidden_layers=12, # Significantly fewer layers than the base's 22
num_attention_heads=6, # Must be a divisor of hidden_size
intermediate_size=1536, # 4 * hidden_size -- VERY WIDE!!
max_position_embeddings=1024, # Max sequence length for the model; originally 8192
)
model = ModernBertModel(modernbert_small_config)
Model Details
Model Description
- Model Type: Sentence Transformer
- Maximum Sequence Length: 1024 tokens
- Output Dimensionality: 384 dimensions
- Similarity Function: Cosine Similarity
- Training Datasets:
- Language: en
- License: apache-2.0
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 1024, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
queries = [
"A sleeping baby in a pink striped outfit.",
]
documents = [
'A little baby cradled in someones arms.',
'A group of hikers traveling along a rock strewn creek bed.',
'Three young men and a young woman wearing sneakers are leaping in midair at the top of a flight of concrete stairs.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 384] [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[ 0.5804, 0.0193, -0.1261]])
Evaluation
Metrics
Triplet
- Dataset:
all-nli-dev - Evaluated with
TripletEvaluator
| Metric | Value |
|---|---|
| cosine_accuracy | 0.8808 |
Semantic Similarity
- Dataset:
sts-dev - Evaluated with
EmbeddingSimilarityEvaluator
| Metric | Value |
|---|---|
| pearson_cosine | 0.829 |
| spearman_cosine | 0.8276 |
Training Details
Training Datasets
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 128learning_rate: 0.0005weight_decay: 0.01lr_scheduler_type: cosinewarmup_ratio: 0.05bf16: Truebf16_full_eval: Trueload_best_model_at_end: True
All Hyperparameters
Training Logs
Framework Versions
- Python: 3.11.13
- Sentence Transformers: 5.0.0
- Transformers: 4.53.1
- PyTorch: 2.7.1+cu128
- Accelerate: 1.8.1
- Datasets: 4.0.0
- Tokenizers: 0.21.2
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
CachedMultipleNegativesRankingLoss
@misc{gao2021scaling,
title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
year={2021},
eprint={2101.06983},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
CoSENTLoss
@online{kexuefm-8847,
title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
author={Su Jianlin},
year={2022},
month={Jan},
url={https://kexue.fm/archives/8847},
}
- Downloads last month
- 8
Safetensors
Model size
47.7M params
Tensor type
BF16
·
Datasets used to train johnnyboycurtis/ModernBERT-small-sts
Papers for johnnyboycurtis/ModernBERT-small-sts
Evaluation results
- Cosine Accuracy on all nli devself-reported0.881
- Pearson Cosine on sts devself-reported0.829
- Spearman Cosine on sts devself-reported0.828
