VOOZH about

URL: https://huggingface.co/enguard/small-guard-32m-en-prompt-jailbreak-binary-in-the-wild

⇱ enguard/small-guard-32m-en-prompt-jailbreak-binary-in-the-wild · Hugging Face


enguard/small-guard-32m-en-prompt-jailbreak-binary-in-the-wild

This model is a fine-tuned Model2Vec classifier based on minishlab/potion-base-32m for the prompt-jailbreak-binary found in the TrustAIRLab/in-the-wild-jailbreak-prompts dataset.

Installation

pip install model2vec[inference]

Usage

from model2vec.inference import StaticModelPipeline

model = StaticModelPipeline.from_pretrained(
 "enguard/small-guard-32m-en-prompt-jailbreak-binary-in-the-wild"
)


# Supports single texts. Format input as a single text:
text = "Example sentence"

model.predict([text])
model.predict_proba([text])

Why should you use these models?

  • Optimized for precision to reduce false positives.
  • Extremely fast inference: up to x500 faster than SetFit.

This model variant

Below is a quick overview of the model variant and core metrics.

Field Value
Classifies prompt-jailbreak-binary
Base Model minishlab/potion-base-32m
Precision 0.9179
Recall 0.8396
F1 0.8770

Confusion Matrix

True \ Predicted FAIL PASS
FAIL 258 51
PASS 22 290

Other model variants

Below is a general overview of the best-performing models for each dataset variant.

Classifies Model Precision Recall F1
prompt-jailbreak-binary enguard/tiny-guard-2m-en-prompt-jailbreak-binary-in-the-wild 0.9535 0.6997 0.8071
prompt-jailbreak-binary enguard/tiny-guard-4m-en-prompt-jailbreak-binary-in-the-wild 0.9397 0.7440 0.8305
prompt-jailbreak-binary enguard/tiny-guard-8m-en-prompt-jailbreak-binary-in-the-wild 0.9433 0.7952 0.8630
prompt-jailbreak-binary enguard/small-guard-32m-en-prompt-jailbreak-binary-in-the-wild 0.9179 0.8396 0.8770
prompt-jailbreak-binary enguard/medium-guard-128m-xx-prompt-jailbreak-binary-in-the-wild 0.9240 0.8294 0.8741

Resources

Citation

If you use this model, please cite Model2Vec:

@software{minishlab2024model2vec,
 author = {Stephan Tulkens and {van Dongen}, Thomas},
 title = {Model2Vec: Fast State-of-the-Art Static Embeddings},
 year = {2024},
 publisher = {Zenodo},
 doi = {10.5281/zenodo.17270888},
 url = {https://github.com/MinishLab/model2vec},
 license = {MIT}
}
Downloads last month
4

Dataset used to train enguard/small-guard-32m-en-prompt-jailbreak-binary-in-the-wild

Collection including enguard/small-guard-32m-en-prompt-jailbreak-binary-in-the-wild