VOOZH about

URL: https://huggingface.co/johnnyboycurtis/ModernBERT-small-1.5-sts

⇱ johnnyboycurtis/ModernBERT-small-1.5-sts · Hugging Face


ModernBERT-small-1.5 for General Purpose Similarity

This is a sentence-transformers model trained on the nli, quora, natural_questions, stsb, sentence_compression, simple_wiki, altlex, coco_captions, flickr30k_captions, yahoo_answers and stack_exchange datasets. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
 (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
 (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
queries = [
 "A sleeping baby in a pink striped outfit.",
]
documents = [
 'A little baby cradled in someones arms.',
 'A group of hikers traveling along a rock strewn creek bed.',
 'Three young men and a young woman wearing sneakers are leaping in midair at the top of a flight of concrete stairs.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 384] [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[ 0.5800, 0.0298, -0.0471]])

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.8759

Semantic Similarity

Metric Value
pearson_cosine 0.8249
spearman_cosine 0.8234

Training Details

Training Datasets

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • learning_rate: 0.0005
  • weight_decay: 0.01
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.05
  • bf16: True
  • bf16_full_eval: True
  • load_best_model_at_end: True

All Hyperparameters

Training Logs

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 5.0.0
  • Transformers: 4.53.1
  • PyTorch: 2.7.1+cu128
  • Accelerate: 1.8.1
  • Datasets: 4.0.0
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
 title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
 author = "Reimers, Nils and Gurevych, Iryna",
 booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
 month = "11",
 year = "2019",
 publisher = "Association for Computational Linguistics",
 url = "https://arxiv.org/abs/1908.10084",
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
 title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
 author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
 year={2021},
 eprint={2101.06983},
 archivePrefix={arXiv},
 primaryClass={cs.LG}
}

CoSENTLoss

@online{kexuefm-8847,
 title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
 author={Su Jianlin},
 year={2022},
 month={Jan},
 url={https://kexue.fm/archives/8847},
}
Downloads last month
3
Safetensors
Model size
26.9M params
Tensor type
BF16
·

Datasets used to train johnnyboycurtis/ModernBERT-small-1.5-sts

Papers for johnnyboycurtis/ModernBERT-small-1.5-sts

Evaluation results