ModernBERT-small for General Purpose Similarity

This is a sentence-transformers model trained on the nli, quora, natural_questions, stsb, sentence_compression, simple_wiki, altlex, coco_captions, flickr30k_captions, yahoo_answers and stack_exchange datasets. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

This model is based on the wide architecture of johnnyboycurtis/ModernBERT-small

small_modernbert_config = ModernBertConfig(
 hidden_size=384, # A common dimension for small embedding models
 num_hidden_layers=12, # Significantly fewer layers than the base's 22
 num_attention_heads=6, # Must be a divisor of hidden_size
 intermediate_size=1536, # 4 * hidden_size -- VERY WIDE!!
 max_position_embeddings=1024, # Max sequence length for the model; originally 8192
)

model = ModernBertModel(modernbert_small_config)

Model Details

Model Description

Model Type: Sentence Transformer
Maximum Sequence Length: 1024 tokens
Output Dimensionality: 384 dimensions
Similarity Function: Cosine Similarity
Training Datasets:
Language: en
License: apache-2.0

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
 (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
 (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
queries = [
 "A sleeping baby in a pink striped outfit.",
]
documents = [
 'A little baby cradled in someones arms.',
 'A group of hikers traveling along a rock strewn creek bed.',
 'Three young men and a young woman wearing sneakers are leaping in midair at the top of a flight of concrete stairs.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 384] [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[ 0.5804, 0.0193, -0.1261]])

Evaluation

Metrics

Triplet

Dataset: all-nli-dev
Evaluated with TripletEvaluator

Metric	Value
cosine_accuracy	0.8808

Semantic Similarity

Dataset: sts-dev
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.829
spearman_cosine	0.8276

Training Details

Training Datasets

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 128
learning_rate: 0.0005
weight_decay: 0.01
lr_scheduler_type: cosine
warmup_ratio: 0.05
bf16: True
bf16_full_eval: True
load_best_model_at_end: True

All Hyperparameters

Training Logs

Framework Versions

Python: 3.11.13
Sentence Transformers: 5.0.0
Transformers: 4.53.1
PyTorch: 2.7.1+cu128
Accelerate: 1.8.1
Datasets: 4.0.0
Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
 title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
 author = "Reimers, Nils and Gurevych, Iryna",
 booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
 month = "11",
 year = "2019",
 publisher = "Association for Computational Linguistics",
 url = "https://arxiv.org/abs/1908.10084",
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
 title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
 author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
 year={2021},
 eprint={2101.06983},
 archivePrefix={arXiv},
 primaryClass={cs.LG}
}

CoSENTLoss

@online{kexuefm-8847,
 title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
 author={Su Jianlin},
 year={2022},
 month={Jan},
 url={https://kexue.fm/archives/8847},
}

Downloads last month: 8

Safetensors

Model size

47.7M params

Tensor type

BF16

Datasets used to train johnnyboycurtis/ModernBERT-small-sts

Papers for johnnyboycurtis/ModernBERT-small-sts

Paper • 2101.06983 • Published Jan 18, 2021 • 2

Paper • 1908.10084 • Published Aug 27, 2019 • 15

Evaluation results

Cosine Accuracy on all nli dev
self-reported
0.881
Pearson Cosine on sts dev
self-reported
0.829
Spearman Cosine on sts dev
self-reported
0.828

URL: https://huggingface.co/johnnyboycurtis/ModernBERT-small-sts

⇱ johnnyboycurtis/ModernBERT-small-sts · Hugging Face