VOOZH about

URL: https://huggingface.co/johnnyboycurtis/ModernBERT-small-sts

⇱ johnnyboycurtis/ModernBERT-small-sts · Hugging Face


ModernBERT-small for General Purpose Similarity

This is a sentence-transformers model trained on the nli, quora, natural_questions, stsb, sentence_compression, simple_wiki, altlex, coco_captions, flickr30k_captions, yahoo_answers and stack_exchange datasets. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

This model is based on the wide architecture of johnnyboycurtis/ModernBERT-small

small_modernbert_config = ModernBertConfig(
 hidden_size=384, # A common dimension for small embedding models
 num_hidden_layers=12, # Significantly fewer layers than the base's 22
 num_attention_heads=6, # Must be a divisor of hidden_size
 intermediate_size=1536, # 4 * hidden_size -- VERY WIDE!!
 max_position_embeddings=1024, # Max sequence length for the model; originally 8192
)

model = ModernBertModel(modernbert_small_config)

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
 (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
 (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
queries = [
 "A sleeping baby in a pink striped outfit.",
]
documents = [
 'A little baby cradled in someones arms.',
 'A group of hikers traveling along a rock strewn creek bed.',
 'Three young men and a young woman wearing sneakers are leaping in midair at the top of a flight of concrete stairs.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 384] [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[ 0.5804, 0.0193, -0.1261]])

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.8808

Semantic Similarity

Metric Value
pearson_cosine 0.829
spearman_cosine 0.8276

Training Details

Training Datasets

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • learning_rate: 0.0005
  • weight_decay: 0.01
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.05
  • bf16: True
  • bf16_full_eval: True
  • load_best_model_at_end: True

All Hyperparameters

Training Logs

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 5.0.0
  • Transformers: 4.53.1
  • PyTorch: 2.7.1+cu128
  • Accelerate: 1.8.1
  • Datasets: 4.0.0
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
 title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
 author = "Reimers, Nils and Gurevych, Iryna",
 booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
 month = "11",
 year = "2019",
 publisher = "Association for Computational Linguistics",
 url = "https://arxiv.org/abs/1908.10084",
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
 title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
 author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
 year={2021},
 eprint={2101.06983},
 archivePrefix={arXiv},
 primaryClass={cs.LG}
}

CoSENTLoss

@online{kexuefm-8847,
 title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
 author={Su Jianlin},
 year={2022},
 month={Jan},
 url={https://kexue.fm/archives/8847},
}
Downloads last month
8
Safetensors
Model size
47.7M params
Tensor type
BF16
·

Datasets used to train johnnyboycurtis/ModernBERT-small-sts

Papers for johnnyboycurtis/ModernBERT-small-sts

Evaluation results