Paper • 2205.13147 • Published • 27
SentenceTransformer based on estrogen/ModernBERT-base-sbert-initialized
This is a sentence-transformers model finetuned from estrogen/ModernBERT-base-sbert-initialized on the msmarco-bm25 dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: estrogen/ModernBERT-base-sbert-initialized
- Maximum Sequence Length: 8192 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
- Language: en
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("estrogen/ModernBERT-base-marco")
# Run inference
sentences = [
'[unused0]what period do we live in',
'[unused1]Earth is currently in the Quaternary Period of the Cenozoic Era.',
'[unused1]Thereâ\x80\x99s a big difference in drive time depending on what part of Ewa Beach you live in â\x80\x94 the homes in Ocean Point (most southern part of Ewa) have a 20 minute longer drive than we do, even though we both live in the town of Ewa Beach.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Triplet
- Dataset:
ms_marco - Evaluated with
TripletEvaluator
| Metric | Value |
|---|---|
| cosine_accuracy | 0.9537 |
Triplet
- Dataset:
ms_marco - Evaluated with
TripletEvaluator
| Metric | Value |
|---|---|
| cosine_accuracy | 0.9587 |
Training Details
Training Dataset
msmarco-bm25
- Dataset: msmarco-bm25 at ce8a493
- Size: 19,139,199 training samples
- Columns:
query,positive, andnegative - Approximate statistics based on the first 1000 samples:
query positive negative type string string string details - min: 6 tokens
- mean: 10.51 tokens
- max: 17 tokens
- min: 45 tokens
- mean: 86.37 tokens
- max: 212 tokens
- min: 25 tokens
- mean: 80.75 tokens
- max: 222 tokens
- Samples:
query positive negative [unused0]what are the liberal arts?[unused1]liberal arts. 1. the academic course of instruction at a college intended to provide general knowledge and comprising the arts, humanities, natural sciences, and social sciences, as opposed to professional or technical subjects.[unused1]The New York State Education Department requires 60 Liberal Arts credits in a Bachelor of Science program and 90 Liberal Arts credits in a Bachelor of Arts program. In the list of course descriptions, courses which are liberal arts for all students are identified by (Liberal Arts) after the course number.[unused0]what are the liberal arts?[unused1]liberal arts. 1. the academic course of instruction at a college intended to provide general knowledge and comprising the arts, humanities, natural sciences, and social sciences, as opposed to professional or technical subjects.[unused1]You can choose from an array of liberal arts majors. Most of these are offered in the liberal arts departments of colleges that belong to universities and at smaller colleges that are designated as liberal arts institutions.[unused0]what are the liberal arts?[unused1]liberal arts. 1. the academic course of instruction at a college intended to provide general knowledge and comprising the arts, humanities, natural sciences, and social sciences, as opposed to professional or technical subjects.[unused1]Majors. You can choose from an array of liberal arts majors. Most of these are offered in the liberal arts departments of colleges that belong to universities and at smaller colleges that are designated as liberal arts institutions. - Loss:
MatryoshkaLosswith these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 0.9, 0.81, 0.7290000000000001, 0.6561 ], "n_dims_per_step": -1 }
Evaluation Dataset
msmarco-bm25
- Dataset: msmarco-bm25 at ce8a493
- Size: 19,139,199 evaluation samples
- Columns:
query,positive, andnegative - Approximate statistics based on the first 1000 samples:
query positive negative type string string string details - min: 5 tokens
- mean: 10.42 tokens
- max: 23 tokens
- min: 20 tokens
- mean: 80.07 tokens
- max: 167 tokens
- min: 18 tokens
- mean: 82.48 tokens
- max: 213 tokens
- Samples:
query positive negative [unused0]different uses of corn[unused1]Corn or maize oil is extracted from the germ of corn, and its main use is for cooking. It is also a key ingredient in margarine and other processed foods. Corn oil is also a feedstock used for biodiesel.From 2012 to 2014, the use of nonfood-grade (NFG) corn oil for biodiesel production has grown tremendously.ses of Corn Oil. Apart from serving as a less-than-ideal cooking oil, corn oil has several industrial uses, including as an addition to soap, salve, paint, ink, textiles, and insecticides. It also sometimes functions as a carrier for drug molecules in pharmaceutical products.[unused1]Impact of Ethanol on Corn Prices. The U.S. produces 40 percent of the worldâs corn, [5] and ethanol production uses about 40 percent of U.S. corn production, [6] but roughly one-third of the value of the corn used in ethanol production returns to the feed market as DDGS.[unused0]different uses of corn[unused1]Corn or maize oil is extracted from the germ of corn, and its main use is for cooking. It is also a key ingredient in margarine and other processed foods. Corn oil is also a feedstock used for biodiesel.From 2012 to 2014, the use of nonfood-grade (NFG) corn oil for biodiesel production has grown tremendously.ses of Corn Oil. Apart from serving as a less-than-ideal cooking oil, corn oil has several industrial uses, including as an addition to soap, salve, paint, ink, textiles, and insecticides. It also sometimes functions as a carrier for drug molecules in pharmaceutical products.[unused1]But ask different reptile keepers how long corn do corn snakes get and you won't get one standard answer. Like us humans, who may grow to little more than 5 feet tall to well over 6 feet in adults, different corn snakes attain different sizes.[unused0]different uses of corn[unused1]Corn or maize oil is extracted from the germ of corn, and its main use is for cooking. It is also a key ingredient in margarine and other processed foods. Corn oil is also a feedstock used for biodiesel.From 2012 to 2014, the use of nonfood-grade (NFG) corn oil for biodiesel production has grown tremendously.ses of Corn Oil. Apart from serving as a less-than-ideal cooking oil, corn oil has several industrial uses, including as an addition to soap, salve, paint, ink, textiles, and insecticides. It also sometimes functions as a carrier for drug molecules in pharmaceutical products.[unused1]The corn system uses a large amount of natural resources. Even though it does not deliver as much food as comparable systems around the globe, the American corn system continues to use a large proportion of our countryâs natural resources.he corn system uses a large amount of natural resources. Even though it does not deliver as much food as comparable systems around the globe, the American corn system continues to use a large proportion of our countryâs natural resources. - Loss:
MatryoshkaLosswith these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 0.9, 0.81, 0.7290000000000001, 0.6561 ], "n_dims_per_step": -1 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 256per_device_eval_batch_size: 256num_train_epochs: 1warmup_ratio: 0.1bf16: Trueprompts: {'query': '[unused0]', 'positive': '[unused1]', 'negative': '[unused1]'}batch_sampler: no_duplicates
All Hyperparameters
Training Logs
Framework Versions
- Python: 3.10.12
- Sentence Transformers: 3.3.1
- Transformers: 4.48.0.dev0
- PyTorch: 2.1.0+cu118
- Accelerate: 1.2.1
- Datasets: 3.2.0
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MatryoshkaLoss
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 13
Safetensors
Model size
0.1B params
Tensor type
F32
·
Model tree for estrogen/ModernBERT-base-marco
Base model
answerdotai/ModernBERT-baseDataset used to train estrogen/ModernBERT-base-marco
Papers for estrogen/ModernBERT-base-marco
Evaluation results
- Cosine Accuracy on ms marcoself-reported0.954
- Cosine Accuracy on ms marcoself-reported0.959
