Tiny guardrails for 'response-refusal-binary' trained on https://huggingface.co/datasets/ToxicityPrompts/PolyGuardMix. • 5 items • Updated
enguard/tiny-guard-2m-en-response-refusal-binary-polyguard
This model is a fine-tuned Model2Vec classifier based on minishlab/potion-base-2m for the response-refusal-binary found in the ToxicityPrompts/PolyGuardMix dataset.
Installation
pip install model2vec[inference]
Usage
from model2vec.inference import StaticModelPipeline
model = StaticModelPipeline.from_pretrained(
"enguard/tiny-guard-2m-en-response-refusal-binary-polyguard"
)
# Supports single texts. Format input as a single text:
text = "Example sentence"
model.predict([text])
model.predict_proba([text])
Why should you use these models?
- Optimized for precision to reduce false positives.
- Extremely fast inference: up to x500 faster than SetFit.
This model variant
Below is a quick overview of the model variant and core metrics.
| Field | Value |
|---|---|
| Classifies | response-refusal-binary |
| Base Model | minishlab/potion-base-2m |
| Precision | 0.9486 |
| Recall | 0.8203 |
| F1 | 0.8798 |
Confusion Matrix
| True \ Predicted | FAIL | PASS |
|---|---|---|
| FAIL | 4721 | 940 |
| PASS | 240 | 5331 |
Other model variants
Below is a general overview of the best-performing models for each dataset variant.
Resources
- Awesome AI Guardrails: https://github.com/enguard-ai/awesome-ai-guardails
- Model2Vec: https://github.com/MinishLab/model2vec
- Docs: https://minish.ai/packages/model2vec/introduction
Citation
If you use this model, please cite Model2Vec:
@software{minishlab2024model2vec,
author = {Stephan Tulkens and {van Dongen}, Thomas},
title = {Model2Vec: Fast State-of-the-Art Static Embeddings},
year = {2024},
publisher = {Zenodo},
doi = {10.5281/zenodo.17270888},
url = {https://github.com/MinishLab/model2vec},
license = {MIT}
}
- Downloads last month
- 17
