ModernBERT-small-1.5 for General Purpose Similarity

This is a sentence-transformers model trained on the nli, quora, natural_questions, stsb, sentence_compression, simple_wiki, altlex, coco_captions, flickr30k_captions, yahoo_answers and stack_exchange datasets. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Maximum Sequence Length: 1024 tokens
Output Dimensionality: 384 dimensions
Similarity Function: Cosine Similarity
Training Datasets:
Language: en
License: apache-2.0

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
 (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
 (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
queries = [
 "A sleeping baby in a pink striped outfit.",
]
documents = [
 'A little baby cradled in someones arms.',
 'A group of hikers traveling along a rock strewn creek bed.',
 'Three young men and a young woman wearing sneakers are leaping in midair at the top of a flight of concrete stairs.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 384] [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[ 0.5800, 0.0298, -0.0471]])

Evaluation

Metrics

Triplet

Dataset: all-nli-dev
Evaluated with TripletEvaluator

Metric	Value
cosine_accuracy	0.8759

Semantic Similarity

Dataset: sts-dev
Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.8249
spearman_cosine	0.8234

Training Details

Training Datasets

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 128
learning_rate: 0.0005
weight_decay: 0.01
lr_scheduler_type: cosine
warmup_ratio: 0.05
bf16: True
bf16_full_eval: True
load_best_model_at_end: True

All Hyperparameters

Training Logs

Framework Versions

Python: 3.11.13
Sentence Transformers: 5.0.0
Transformers: 4.53.1
PyTorch: 2.7.1+cu128
Accelerate: 1.8.1
Datasets: 4.0.0
Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
 title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
 author = "Reimers, Nils and Gurevych, Iryna",
 booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
 month = "11",
 year = "2019",
 publisher = "Association for Computational Linguistics",
 url = "https://arxiv.org/abs/1908.10084",
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
 title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
 author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
 year={2021},
 eprint={2101.06983},
 archivePrefix={arXiv},
 primaryClass={cs.LG}
}

CoSENTLoss

@online{kexuefm-8847,
 title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
 author={Su Jianlin},
 year={2022},
 month={Jan},
 url={https://kexue.fm/archives/8847},
}

Downloads last month: 3

Safetensors

Model size

26.9M params

Tensor type

BF16

Datasets used to train johnnyboycurtis/ModernBERT-small-1.5-sts

Papers for johnnyboycurtis/ModernBERT-small-1.5-sts

Paper • 2101.06983 • Published Jan 18, 2021 • 2

Paper • 1908.10084 • Published Aug 27, 2019 • 15

Evaluation results

Cosine Accuracy on all nli dev
self-reported
0.876
Pearson Cosine on sts dev
self-reported
0.825
Spearman Cosine on sts dev
self-reported
0.823

URL: https://huggingface.co/johnnyboycurtis/ModernBERT-small-1.5-sts

⇱ johnnyboycurtis/ModernBERT-small-1.5-sts · Hugging Face