VOOZH about

URL: https://huggingface.co/cybersectony/phishing-email-detection-distilbert_v2.4.1

⇱ cybersectony/phishing-email-detection-distilbert_v2.4.1 · Hugging Face


A distilBERT based Phishing Email Detection Model

Model Overview

This model is based on DistilBERT and has been fine-tuned for multilabel classification of Emails and URLs as safe or potentially phishing.

Key Specifications

  • Base Architecture: DistilBERT
  • Task: Multilabel Classification
  • Fine-tuning Framework: Hugging Face Trainer API
  • Training Duration: 3 epochs

Performance Metrics

  • Accuracy: 99.58
  • F1-score: 99.579
  • Precision: 99.583
  • Recall: 99.58

Dataset Details

The model was trained on a custom dataset of Emails and URLs labeled as legitimate or phishing. The dataset is available at cybersectony/PhishingEmailDetectionv2.0 on the Hugging Face Hub.

Usage Guide

Installation

pip install transformers
pip install torch

Quick Start

from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("cybersectony/phishing-email-detection-distilbert_v2.4.1")
import torch

# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("cybersectony/phishing-email-detection-distilbert_v2.4.1")

def predict_email(email_text):
 # Preprocess and tokenize
 inputs = tokenizer(
 email_text,
 return_tensors="pt",
 truncation=True,
 max_length=512
 )
 
 # Get prediction
 with torch.no_grad():
 outputs = model(**inputs)
 predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
 
 # Get probabilities for each class
 probs = predictions[0].tolist()
 
 # Create labels dictionary
 labels = {
 "legitimate_email": probs[0],
 "phishing_url": probs[1],
 "legitimate_url": probs[2],
 "phishing_url_alt": probs[3]
 }
 
 # Determine the most likely classification
 max_label = max(labels.items(), key=lambda x: x[1])
 
 return {
 "prediction": max_label[0],
 "confidence": max_label[1],
 "all_probabilities": labels
 }

Example Usage

# Example usage
email = """
Dear User,
Your account security needs immediate attention. Please verify your credentials.
Click here: http://suspicious-link.com
"""

result = predict_email(email)
print(f"Prediction: {result['prediction']}")
print(f"Confidence: {result['confidence']:.2%}")
print("\nAll probabilities:")
for label, prob in result['all_probabilities'].items():
 print(f"{label}: {prob:.2%}")
Downloads last month
299,656
Safetensors
Model size
67M params
Tensor type
F32
·

Model tree for cybersectony/phishing-email-detection-distilbert_v2.4.1

Finetuned
(11858)
this model
Finetunes
4 models
Quantizations
1 model

Dataset used to train cybersectony/phishing-email-detection-distilbert_v2.4.1

Spaces using cybersectony/phishing-email-detection-distilbert_v2.4.1 14