Tiny guardrails for 'prompt-jailbreak-binary' trained on https://huggingface.co/datasets/TrustAIRLab/in-the-wild-jailbreak-prompts. • 5 items • Updated • 1
enguard/medium-guard-128m-xx-prompt-jailbreak-binary-in-the-wild
This model is a fine-tuned Model2Vec classifier based on minishlab/potion-multilingual-128M for the prompt-jailbreak-binary found in the TrustAIRLab/in-the-wild-jailbreak-prompts dataset.
Installation
pip install model2vec[inference]
Usage
from model2vec.inference import StaticModelPipeline
model = StaticModelPipeline.from_pretrained(
"enguard/medium-guard-128m-xx-prompt-jailbreak-binary-in-the-wild"
)
# Supports single texts. Format input as a single text:
text = "Example sentence"
model.predict([text])
model.predict_proba([text])
Why should you use these models?
- Optimized for precision to reduce false positives.
- Extremely fast inference: up to x500 faster than SetFit.
This model variant
Below is a quick overview of the model variant and core metrics.
| Field | Value |
|---|---|
| Classifies | prompt-jailbreak-binary |
| Base Model | minishlab/potion-multilingual-128M |
| Precision | 0.9240 |
| Recall | 0.8294 |
| F1 | 0.8741 |
Confusion Matrix
| True \ Predicted | FAIL | PASS |
|---|---|---|
| FAIL | 256 | 53 |
| PASS | 20 | 292 |
Other model variants
Below is a general overview of the best-performing models for each dataset variant.
| Classifies | Model | Precision | Recall | F1 |
|---|---|---|---|---|
| prompt-jailbreak-binary | enguard/tiny-guard-2m-en-prompt-jailbreak-binary-in-the-wild | 0.9535 | 0.6997 | 0.8071 |
| prompt-jailbreak-binary | enguard/tiny-guard-4m-en-prompt-jailbreak-binary-in-the-wild | 0.9397 | 0.7440 | 0.8305 |
| prompt-jailbreak-binary | enguard/tiny-guard-8m-en-prompt-jailbreak-binary-in-the-wild | 0.9433 | 0.7952 | 0.8630 |
| prompt-jailbreak-binary | enguard/small-guard-32m-en-prompt-jailbreak-binary-in-the-wild | 0.9179 | 0.8396 | 0.8770 |
| prompt-jailbreak-binary | enguard/medium-guard-128m-xx-prompt-jailbreak-binary-in-the-wild | 0.9240 | 0.8294 | 0.8741 |
Resources
- Awesome AI Guardrails: https://github.com/enguard-ai/awesome-ai-guardails
- Model2Vec: https://github.com/MinishLab/model2vec
- Docs: https://minish.ai/packages/model2vec/introduction
Citation
If you use this model, please cite Model2Vec:
@software{minishlab2024model2vec,
author = {Stephan Tulkens and {van Dongen}, Thomas},
title = {Model2Vec: Fast State-of-the-Art Static Embeddings},
year = {2024},
publisher = {Zenodo},
doi = {10.5281/zenodo.17270888},
url = {https://github.com/MinishLab/model2vec},
license = {MIT}
}
- Downloads last month
- 5
Model tree for enguard/medium-guard-128m-xx-prompt-jailbreak-binary-in-the-wild
Base model
minishlab/potion-multilingual-128M