SentenceTransformer based on estrogen/ModernBERT-base-sbert-initialized

This is a sentence-transformers model finetuned from estrogen/ModernBERT-base-sbert-initialized on the all-nli dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: estrogen/ModernBERT-base-sbert-initialized
Maximum Sequence Length: 8192 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity
Training Dataset:
- all-nli
Language: en

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
 (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel 
 (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("estrogen/ModernBERT-base-nli-v3")
# Run inference
sentences = [
 'A middle-aged man works under the engine of a train on rail tracks.',
 'A guy is working on a train.',
 'A guy is driving to work.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Datasets: sts-dev and sts-test
Evaluated with EmbeddingSimilarityEvaluator

Metric	sts-dev	sts-test
pearson_cosine	0.8602	0.8484
spearman_cosine	0.8651	0.8505

Training Details

Training Dataset

all-nli

Dataset: all-nli at d482672
Size: 557,850 training samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 7 tokens mean: 10.46 tokens max: 46 tokens	min: 6 tokens mean: 12.91 tokens max: 40 tokens	min: 5 tokens mean: 13.49 tokens max: 51 tokens

Samples:

anchor	positive	negative
`A person on a horse jumps over a broken down airplane.`	`A person is outdoors, on a horse.`	`A person is at a diner, ordering an omelette.`
`Children smiling and waving at camera`	`There are children present`	`The kids are frowning`
`A boy is jumping on skateboard in the middle of a red bridge.`	`The boy does a skateboarding trick.`	`The boy skates down the sidewalk.`

Loss: MatryoshkaLoss with these parameters:

{
 "loss": "MultipleNegativesRankingLoss",
 "matryoshka_dims": [
 768,
 512,
 256,
 128,
 64
 ],
 "matryoshka_weights": [
 1,
 1,
 1,
 1,
 1
 ],
 "n_dims_per_step": -1
}

Evaluation Dataset

all-nli

Dataset: all-nli at d482672
Size: 6,584 evaluation samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 6 tokens mean: 18.25 tokens max: 69 tokens	min: 5 tokens mean: 9.88 tokens max: 30 tokens	min: 5 tokens mean: 10.48 tokens max: 29 tokens

Samples:

anchor	positive	negative
`Two women are embracing while holding to go packages.`	`Two woman are holding packages.`	`The men are fighting outside a deli.`
`Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink.`	`Two kids in numbered jerseys wash their hands.`	`Two kids in jackets walk to school.`
`A man selling donuts to a customer during a world exhibition event held in the city of Angeles`	`A man selling donuts to a customer.`	`A woman drinks her coffee in a small cafe.`

Loss: MatryoshkaLoss with these parameters:

{
 "loss": "MultipleNegativesRankingLoss",
 "matryoshka_dims": [
 768,
 512,
 256,
 128,
 64
 ],
 "matryoshka_weights": [
 1,
 1,
 1,
 1,
 1
 ],
 "n_dims_per_step": -1
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 1024
per_device_eval_batch_size: 1024
num_train_epochs: 1
warmup_ratio: 0.1
bf16: True
batch_sampler: no_duplicates

All Hyperparameters

Training Logs

Framework Versions

Python: 3.10.12
Sentence Transformers: 3.3.1
Transformers: 4.48.0.dev0
PyTorch: 2.1.0+cu118
Accelerate: 1.2.1
Datasets: 3.2.0
Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
 title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
 author = "Reimers, Nils and Gurevych, Iryna",
 booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
 month = "11",
 year = "2019",
 publisher = "Association for Computational Linguistics",
 url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
 title={Matryoshka Representation Learning},
 author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
 year={2024},
 eprint={2205.13147},
 archivePrefix={arXiv},
 primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
 title={Efficient Natural Language Response Suggestion for Smart Reply},
 author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
 year={2017},
 eprint={1705.00652},
 archivePrefix={arXiv},
 primaryClass={cs.CL}
}

Downloads last month: 49

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for estrogen/ModernBERT-base-nli-v3

Base model

answerdotai/ModernBERT-base

Finetuned

(1334)

this model

Dataset used to train estrogen/ModernBERT-base-nli-v3

Papers for estrogen/ModernBERT-base-nli-v3

Paper • 2205.13147 • Published May 26, 2022 • 27

Paper • 1908.10084 • Published Aug 27, 2019 • 15

Paper • 1705.00652 • Published May 1, 2017

Evaluation results

Pearson Cosine on sts dev
self-reported
0.860
Spearman Cosine on sts dev
self-reported
0.865
Pearson Cosine on sts test
self-reported
0.848
Spearman Cosine on sts test
self-reported
0.850

URL: https://huggingface.co/estrogen/ModernBERT-base-nli-v3

⇱ estrogen/ModernBERT-base-nli-v3 · Hugging Face

SentenceTransformer based on estrogen/ModernBERT-base-sbert-initialized

Model Details

Model Description

Model Sources

Full Model Architecture

Usage

Direct Usage (Sentence Transformers)

Evaluation

Metrics

Semantic Similarity

Training Details

Training Dataset

all-nli

Evaluation Dataset

all-nli

Training Hyperparameters

Non-Default Hyperparameters

All Hyperparameters

Training Logs

Framework Versions

Citation

BibTeX

Sentence Transformers

MatryoshkaLoss

MultipleNegativesRankingLoss

Model tree for estrogen/ModernBERT-base-nli-v3

Dataset used to train estrogen/ModernBERT-base-nli-v3

Papers for estrogen/ModernBERT-base-nli-v3

Evaluation results