long context models for MoM multilingual classifier (domain, jailbreak, pii, factual, feedback) • 12 items • Updated
mmBERT-32K Jailbreak Detector (LoRA)
LoRA adapter for jailbreak/prompt injection detection based on mmBERT-32K-YaRN.
Model Details
- Base Model: llm-semantic-router/mmbert-32k-yarn
- LoRA Rank: 48
- LoRA Alpha: 96
- Training: 8 epochs with heavy short-pattern augmentation
Performance
- Validation Accuracy: 98.16%
- F1 Score: 98.15%
- Precision: 98.36%
- Recall: 97.95%
Key Improvements
This model includes heavy oversampling of short jailbreak patterns to improve generalization:
- Detects short patterns like "DAN", "jailbreak", "Developer mode" with 100% confidence
- Properly handles both short and long jailbreak attempts
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel
base_model = "llm-semantic-router/mmbert-32k-yarn"
lora_path = "llm-semantic-router/mmbert32k-jailbreak-detector-lora"
tokenizer = AutoTokenizer.from_pretrained(lora_path)
base = AutoModelForSequenceClassification.from_pretrained(base_model, num_labels=2)
model = PeftModel.from_pretrained(base, lora_path)
- Downloads last month
- 5
Model tree for llm-semantic-router/mmbert32k-jailbreak-detector-lora
Base model
jhu-clsp/mmBERT-base Quantized
llm-semantic-router/mmbert-32k-yarn