VOOZH about

URL: https://huggingface.co/estrogen/ModernBERT-base-marco

⇱ estrogen/ModernBERT-base-marco · Hugging Face


SentenceTransformer based on estrogen/ModernBERT-base-sbert-initialized

This is a sentence-transformers model finetuned from estrogen/ModernBERT-base-sbert-initialized on the msmarco-bm25 dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
 (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel 
 (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("estrogen/ModernBERT-base-marco")
# Run inference
sentences = [
 '[unused0]what period do we live in',
 '[unused1]Earth is currently in the Quaternary Period of the Cenozoic Era.',
 '[unused1]Thereâ\x80\x99s a big difference in drive time depending on what part of Ewa Beach you live in â\x80\x94 the homes in Ocean Point (most southern part of Ewa) have a 20 minute longer drive than we do, even though we both live in the town of Ewa Beach.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.9537

Triplet

Metric Value
cosine_accuracy 0.9587

Training Details

Training Dataset

msmarco-bm25

  • Dataset: msmarco-bm25 at ce8a493
  • Size: 19,139,199 training samples
  • Columns: query, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    query positive negative
    type string string string
    details
    • min: 6 tokens
    • mean: 10.51 tokens
    • max: 17 tokens
    • min: 45 tokens
    • mean: 86.37 tokens
    • max: 212 tokens
    • min: 25 tokens
    • mean: 80.75 tokens
    • max: 222 tokens
  • Samples:
    query positive negative
    [unused0]what are the liberal arts? [unused1]liberal arts. 1. the academic course of instruction at a college intended to provide general knowledge and comprising the arts, humanities, natural sciences, and social sciences, as opposed to professional or technical subjects. [unused1]The New York State Education Department requires 60 Liberal Arts credits in a Bachelor of Science program and 90 Liberal Arts credits in a Bachelor of Arts program. In the list of course descriptions, courses which are liberal arts for all students are identified by (Liberal Arts) after the course number.
    [unused0]what are the liberal arts? [unused1]liberal arts. 1. the academic course of instruction at a college intended to provide general knowledge and comprising the arts, humanities, natural sciences, and social sciences, as opposed to professional or technical subjects. [unused1]You can choose from an array of liberal arts majors. Most of these are offered in the liberal arts departments of colleges that belong to universities and at smaller colleges that are designated as liberal arts institutions.
    [unused0]what are the liberal arts? [unused1]liberal arts. 1. the academic course of instruction at a college intended to provide general knowledge and comprising the arts, humanities, natural sciences, and social sciences, as opposed to professional or technical subjects. [unused1]Majors. You can choose from an array of liberal arts majors. Most of these are offered in the liberal arts departments of colleges that belong to universities and at smaller colleges that are designated as liberal arts institutions.
  • Loss: MatryoshkaLoss with these parameters:
    {
     "loss": "MultipleNegativesRankingLoss",
     "matryoshka_dims": [
     768,
     512,
     256,
     128,
     64
     ],
     "matryoshka_weights": [
     1,
     0.9,
     0.81,
     0.7290000000000001,
     0.6561
     ],
     "n_dims_per_step": -1
    }
    

Evaluation Dataset

msmarco-bm25

  • Dataset: msmarco-bm25 at ce8a493
  • Size: 19,139,199 evaluation samples
  • Columns: query, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    query positive negative
    type string string string
    details
    • min: 5 tokens
    • mean: 10.42 tokens
    • max: 23 tokens
    • min: 20 tokens
    • mean: 80.07 tokens
    • max: 167 tokens
    • min: 18 tokens
    • mean: 82.48 tokens
    • max: 213 tokens
  • Samples:
    query positive negative
    [unused0]different uses of corn [unused1]Corn or maize oil is extracted from the germ of corn, and its main use is for cooking. It is also a key ingredient in margarine and other processed foods. Corn oil is also a feedstock used for biodiesel.From 2012 to 2014, the use of nonfood-grade (NFG) corn oil for biodiesel production has grown tremendously.ses of Corn Oil. Apart from serving as a less-than-ideal cooking oil, corn oil has several industrial uses, including as an addition to soap, salve, paint, ink, textiles, and insecticides. It also sometimes functions as a carrier for drug molecules in pharmaceutical products. [unused1]Impact of Ethanol on Corn Prices. The U.S. produces 40 percent of the world’s corn, [5] and ethanol production uses about 40 percent of U.S. corn production, [6] but roughly one-third of the value of the corn used in ethanol production returns to the feed market as DDGS.
    [unused0]different uses of corn [unused1]Corn or maize oil is extracted from the germ of corn, and its main use is for cooking. It is also a key ingredient in margarine and other processed foods. Corn oil is also a feedstock used for biodiesel.From 2012 to 2014, the use of nonfood-grade (NFG) corn oil for biodiesel production has grown tremendously.ses of Corn Oil. Apart from serving as a less-than-ideal cooking oil, corn oil has several industrial uses, including as an addition to soap, salve, paint, ink, textiles, and insecticides. It also sometimes functions as a carrier for drug molecules in pharmaceutical products. [unused1]But ask different reptile keepers how long corn do corn snakes get and you won't get one standard answer. Like us humans, who may grow to little more than 5 feet tall to well over 6 feet in adults, different corn snakes attain different sizes.
    [unused0]different uses of corn [unused1]Corn or maize oil is extracted from the germ of corn, and its main use is for cooking. It is also a key ingredient in margarine and other processed foods. Corn oil is also a feedstock used for biodiesel.From 2012 to 2014, the use of nonfood-grade (NFG) corn oil for biodiesel production has grown tremendously.ses of Corn Oil. Apart from serving as a less-than-ideal cooking oil, corn oil has several industrial uses, including as an addition to soap, salve, paint, ink, textiles, and insecticides. It also sometimes functions as a carrier for drug molecules in pharmaceutical products. [unused1]The corn system uses a large amount of natural resources. Even though it does not deliver as much food as comparable systems around the globe, the American corn system continues to use a large proportion of our country’s natural resources.he corn system uses a large amount of natural resources. Even though it does not deliver as much food as comparable systems around the globe, the American corn system continues to use a large proportion of our country’s natural resources.
  • Loss: MatryoshkaLoss with these parameters:
    {
     "loss": "MultipleNegativesRankingLoss",
     "matryoshka_dims": [
     768,
     512,
     256,
     128,
     64
     ],
     "matryoshka_weights": [
     1,
     0.9,
     0.81,
     0.7290000000000001,
     0.6561
     ],
     "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 256
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • bf16: True
  • prompts: {'query': '[unused0]', 'positive': '[unused1]', 'negative': '[unused1]'}
  • batch_sampler: no_duplicates

All Hyperparameters

Training Logs

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.3.1
  • Transformers: 4.48.0.dev0
  • PyTorch: 2.1.0+cu118
  • Accelerate: 1.2.1
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
 title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
 author = "Reimers, Nils and Gurevych, Iryna",
 booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
 month = "11",
 year = "2019",
 publisher = "Association for Computational Linguistics",
 url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
 title={Matryoshka Representation Learning},
 author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
 year={2024},
 eprint={2205.13147},
 archivePrefix={arXiv},
 primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
 title={Efficient Natural Language Response Suggestion for Smart Reply},
 author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
 year={2017},
 eprint={1705.00652},
 archivePrefix={arXiv},
 primaryClass={cs.CL}
}
Downloads last month
13
Safetensors
Model size
0.1B params
Tensor type
F32
·

Model tree for estrogen/ModernBERT-base-marco

Finetuned
(1334)
this model

Dataset used to train estrogen/ModernBERT-base-marco

Papers for estrogen/ModernBERT-base-marco

Evaluation results