AnglE(https://arxiv.org/abs/2309.12871) series Embeddings. โข 5 items โข Updated โข 4
Universal AnglE Embedding
๐ข WhereIsAI/UAE-Large-V1 is licensed under MIT. Feel free to use it in any scenario.
If you use it for academic papers, you could cite us via ๐ citation info.
๐ค Follow us on:
- GitHub: https://github.com/SeanLee97/AnglE.
- Preprint Paper: AnglE-optimized Text Embeddings
- Conference Paper: AoE: Angle-optimized Embeddings for Semantic Textual Similarity (ACL24)
- ๐ Documentation: https://angle.readthedocs.io/en/latest/index.html
Welcome to using AnglE to train and infer powerful sentence embeddings.
๐ Achievements
- ๐ May 16, 2024 | AnglE's paper is accepted by ACL 2024 Main Conference
- ๐
Dec 4, 2023 | ๐ฅ Our universal English sentence embedding
WhereIsAI/UAE-Large-V1achieves SOTA on the MTEB Leaderboard with an average score of 64.64!
๐งโ๐คโ๐ง Siblings:
- WhereIsAI/UAE-Code-Large-V1: This model can be used for code or GitHub issue similarity measurement.
Usage
1. angle_emb
python -m pip install -U angle-emb
- Non-Retrieval Tasks
There is no need to specify any prompts.
from angle_emb import AnglE
from angle_emb.utils import cosine_similarity
angle = AnglE.from_pretrained('WhereIsAI/UAE-Large-V1', pooling_strategy='cls').cuda()
doc_vecs = angle.encode([
'The weather is great!',
'The weather is very good!',
'i am going to bed'
], normalize_embedding=True)
for i, dv1 in enumerate(doc_vecs):
for dv2 in doc_vecs[i+1:]:
print(cosine_similarity(dv1, dv2))
- Retrieval Tasks
For retrieval purposes, please use the prompt Prompts.C for query (not for document).
from angle_emb import AnglE, Prompts
from angle_emb.utils import cosine_similarity
angle = AnglE.from_pretrained('WhereIsAI/UAE-Large-V1', pooling_strategy='cls').cuda()
qv = angle.encode(Prompts.C.format(text='what is the weather?'))
doc_vecs = angle.encode([
'The weather is great!',
'it is rainy today.',
'i am going to bed'
])
for dv in doc_vecs:
print(cosine_similarity(qv[0], dv))
2. sentence transformer
from angle_emb import Prompts
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("WhereIsAI/UAE-Large-V1").cuda()
qv = model.encode(Prompts.C.format(text='what is the weather?'))
doc_vecs = model.encode([
'The weather is great!',
'it is rainy today.',
'i am going to bed'
])
for dv in doc_vecs:
print(1 - spatial.distance.cosine(qv, dv))
3. Infinity
Infinity is a MIT licensed server for OpenAI-compatible deployment.
docker run --gpus all -v $PWD/data:/app/.cache -p "7997":"7997" \
michaelf34/infinity:latest \
v2 --model-id WhereIsAI/UAE-Large-V1 --revision "369c368f70f16a613f19f5598d4f12d9f44235d4" --dtype float16 --batch-size 32 --device cuda --engine torch --port 7997
Citation
If you use our pre-trained models, welcome to support us by citing our work:
@article{li2023angle,
title={AnglE-optimized Text Embeddings},
author={Li, Xianming and Li, Jing},
journal={arXiv preprint arXiv:2309.12871},
year={2023}
}
- Downloads last month
- 1,817,020
Safetensors
Model size
0.3B params
Tensor type
F32
ยท
Model tree for WhereIsAI/UAE-Large-V1
Spaces using WhereIsAI/UAE-Large-V1 69
Collection including WhereIsAI/UAE-Large-V1
Paper for WhereIsAI/UAE-Large-V1
Evaluation results
- accuracy on MTEB AmazonCounterfactualClassification (en)test set self-reported75.552
- ap on MTEB AmazonCounterfactualClassification (en)test set self-reported38.264
- f1 on MTEB AmazonCounterfactualClassification (en)test set self-reported69.410
- accuracy on MTEB AmazonPolarityClassificationtest set self-reported92.843
- ap on MTEB AmazonPolarityClassificationtest set self-reported89.576
- f1 on MTEB AmazonPolarityClassificationtest set self-reported92.826
- accuracy on MTEB AmazonReviewsClassification (en)test set self-reported48.292
- f1 on MTEB AmazonReviewsClassification (en)test set self-reported47.903
