VOOZH about

URL: https://pypi.org/project/sentence-transformers/

โ‡ฑ sentence-transformers ยท PyPI


Skip to main content

sentence-transformers 5.6.0

pip install sentence-transformers

Latest release

Released:

Embeddings, Retrieval, and Reranking

Navigation

Unverified details

These details have not been verified by PyPI
Project links
Meta
  • License Expression: Apache-2.0
    SPDX License Expression
  • Author: Nils Reimers
  • Maintainer: Tom Aarsen
  • Tags Transformer Networks , BERT , XLNet , sentence embedding , PyTorch , NLP , deep learning
  • Requires: Python >=3.10
  • Provides-Extra: image , audio , video , train , onnx , onnx-gpu , openvino , dev

Project description

๐Ÿ‘ HF Models
๐Ÿ‘ GitHub - License
๐Ÿ‘ PyPI - Python Version
๐Ÿ‘ PyPI - Package Version
๐Ÿ‘ Docs - GitHub.io

Sentence Transformers: Embeddings, Retrieval, and Reranking

This framework provides an easy method to compute embeddings for accessing, using, and training state-of-the-art embedding and reranker models. It can be used to compute embeddings using Sentence Transformer models (quickstart), to calculate similarity scores using Cross-Encoder (a.k.a. reranker) models (quickstart) or to generate sparse embeddings using Sparse Encoder models (quickstart). This unlocks a wide range of applications, including semantic search, semantic textual similarity, and paraphrase mining.

A wide selection of over 15,000 pre-trained Sentence Transformers models are available for immediate use on ๐Ÿค— Hugging Face, including many of the state-of-the-art models from the Massive Text Embeddings Benchmark (MTEB) leaderboard. Additionally, it is easy to train or finetune your own embedding models, reranker models or sparse encoder models using Sentence Transformers, enabling you to create custom models for your specific use cases.

For the full documentation, see www.SBERT.net.

Installation

We recommend Python 3.10+, PyTorch 1.11.0+, and transformers v4.41.0+.

pip install -U sentence-transformers

See Installation in the docs for uv, conda, source, and editable installs, CUDA setup, and extras ([image], [audio], [video], [train], [onnx], [openvino], [dev]).

Getting Started

See Quickstart in our documentation.

Embedding Models

First download a pretrained embedding a.k.a. Sentence Transformer model.

fromsentence_transformersimport SentenceTransformer

model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

Then provide some texts to the model.

sentences = [
 "The weather is lovely today.",
 "It's so sunny outside!",
 "He drove to the stadium.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# => (3, 384)

And that's already it. We now have numpy arrays with the embeddings, one for each text. We can use these to compute similarities.

similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.6660, 0.1046],
# [0.6660, 1.0000, 0.1411],
# [0.1046, 0.1411, 1.0000]])

Reranker Models

First download a pretrained reranker a.k.a. Cross Encoder model.

fromsentence_transformersimport CrossEncoder

# 1. Load a pretrained CrossEncoder model
model = CrossEncoder("cross-encoder/ms-marco-MiniLM-L6-v2")

Then provide some texts to the model.

# The texts for which to predict similarity scores
query = "How many people live in Berlin?"
passages = [
 "Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.",
 "Berlin has a yearly total of about 135 million day visitors, making it one of the most-visited cities in the European Union.",
 "In 2013 around 600,000 Berliners were registered in one of the more than 2,300 sport and fitness clubs.",
]

# 2a. predict scores for pairs of texts
scores = model.predict([(query, passage) for passage in passages])
print(scores)
# => [8.607139 5.506266 6.352977]

And we're good to go. You can also use model.rank to avoid having to perform the reranking manually:

# 2b. Rank a list of passages for a query
ranks = model.rank(query, passages, return_documents=True)

print("Query:", query)
for rank in ranks:
 print(f"- #{rank['corpus_id']} ({rank['score']:.2f}): {rank['text']}")
"""
Query: How many people live in Berlin?
- #0 (8.61): Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.
- #2 (6.35): In 2013 around 600,000 Berliners were registered in one of the more than 2,300 sport and fitness clubs.
- #1 (5.51): Berlin has a yearly total of about 135 million day visitors, making it one of the most-visited cities in the European Union.
"""

Sparse Encoder Models

First download a pretrained sparse embedding a.k.a. Sparse Encoder model.

fromsentence_transformersimport SparseEncoder

# 1. Load a pretrained SparseEncoder model
model = SparseEncoder("naver/splade-cocondenser-ensembledistil")

# The sentences to encode
sentences = [
 "The weather is lovely today.",
 "It's so sunny outside!",
 "He drove to the stadium.",
]

# 2. Calculate sparse embeddings by calling model.encode()
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 30522] - sparse representation with vocabulary size dimensions

# 3. Calculate the embedding similarities
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 35.629, 9.154, 0.098],
# [ 9.154, 27.478, 0.019],
# [ 0.098, 0.019, 29.553]])

# 4. Check sparsity stats
stats = SparseEncoder.sparsity(embeddings)
print(f"Sparsity: {stats['sparsity_ratio']:.2%}")
# Sparsity: 99.84%

Pre-Trained Models

We provide a large list of pretrained models for more than 100 languages. Some models are general purpose models, while others produce embeddings for specific use cases.

Training

Tip: Using an AI coding agent (Claude Code, Codex, Cursor, Gemini CLI, ...)? Install the train-sentence-transformers Hugging Face Agent Skill via hf skills add train-sentence-transformers [--claude] [--global] and ask your agent to fine-tune a model on your data.

This framework allows you to fine-tune your own sentence embedding methods, so that you get task-specific sentence embeddings. You have various options to choose from in order to get perfect sentence embeddings for your specific task.

Some highlights across the different types of training are:

  • Support of various transformer networks including BERT, RoBERTa, XLM-R, DistilBERT, Electra, BART, ...
  • Multilingual and multi-task learning
  • Evaluation during training to find optimal model
  • 20+ loss functions for embedding models, 10+ loss functions for reranker models and 10+ loss functions for sparse embedding models, allowing you to tune models specifically for semantic search, paraphrase mining, semantic similarity comparison, clustering, triplet loss, contrastive loss, etc.

Companion Blog Posts

The following Hugging Face blog posts complement this documentation with narrative walkthroughs and full training examples:

Training guides:

Multimodal:

Efficiency techniques:

Application Examples

You can use this framework for:

and many more use-cases.

For all examples, see examples/sentence_transformer/applications.

Development setup

After cloning the repo (or a fork) to your machine, in a virtual environment, run:

python -m pip install -e ".[dev]"

pre-commit install

To test your changes, run:

pytest

Citing & Authors

If you find this repository helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:

@inproceedings{reimers-2019-sentence-bert,
title="Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author="Reimers, Nils and Gurevych, Iryna",
booktitle="Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month="11",
year="2019",
publisher="Association for Computational Linguistics",
url="https://arxiv.org/abs/1908.10084",
}

If you use one of the multilingual models, feel free to cite our publication Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation:

@inproceedings{reimers-2020-multilingual-sentence-bert,
title="Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
author="Reimers, Nils and Gurevych, Iryna",
booktitle="Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
month="11",
year="2020",
publisher="Association for Computational Linguistics",
url="https://arxiv.org/abs/2004.09813",
}

Please have a look at Publications for our different publications that are integrated into SentenceTransformers.

Maintainers

Maintainer: Tom Aarsen, ๐Ÿค— Hugging Face

Don't hesitate to open an issue if something is broken (and it shouldn't be) or if you have further questions.


This project was originally developed by the Ubiquitous Knowledge Processing (UKP) Lab at TU Darmstadt. We're grateful for their foundational work and continued contributions to the field.

This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

Project details

Unverified details

These details have not been verified by PyPI
Project links
Meta
  • License Expression: Apache-2.0
    SPDX License Expression
  • Author: Nils Reimers
  • Maintainer: Tom Aarsen
  • Tags Transformer Networks , BERT , XLNet , sentence embedding , PyTorch , NLP , deep learning
  • Requires: Python >=3.10
  • Provides-Extra: image , audio , video , train , onnx , onnx-gpu , openvino , dev

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sentence_transformers-5.6.0.tar.gz (453.2 kB view details)

Uploaded Source

Built Distribution

Filter files by name, interpreter, ABI, and platform.

If you're not sure about the file name format, learn more about wheel file names.

Copy a direct link to the current filters

sentence_transformers-5.6.0-py3-none-any.whl (596.4 kB view details)

Uploaded Python 3

File details

Details for the file sentence_transformers-5.6.0.tar.gz.

File metadata

  • Download URL: sentence_transformers-5.6.0.tar.gz
  • Upload date:
  • Size: 453.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for sentence_transformers-5.6.0.tar.gz
Algorithm Hash digest
SHA256 0e7164d051e416c1853ade7c274ff52af3f9da0f4be7f0b83d734c27699e1057
MD5 0d0525b352653d35cbdbe16a5b4bd6ad
BLAKE2b-256 f956d2cb00765a6b15c994a7fccf20f9032f16e8193ca49147cb5155166ad744

See more details on using hashes here.

File details

Details for the file sentence_transformers-5.6.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sentence_transformers-5.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d2075b5e687a1611005e20ab04a6846994d51adfcf39610aed066af3c0c0b81f
MD5 06993fd4f1395e1bf889512161f2e761
BLAKE2b-256 76c1dc1582b79e9a2eb0cddf9559cd9bcdff084f541d6fe881fdd9d98630dba7

See more details on using hashes here.

Supported by

๐Ÿ‘ Image
AWS Cloud computing and Security Sponsor ๐Ÿ‘ Image
Datadog Monitoring ๐Ÿ‘ Image
Depot Continuous Integration ๐Ÿ‘ Image
Fastly CDN ๐Ÿ‘ Image
Google Download Analytics ๐Ÿ‘ Image
Pingdom Monitoring ๐Ÿ‘ Image
Sentry Error logging ๐Ÿ‘ Image
StatusPage Status page