VOOZH about

URL: https://pypi.org/project/detoxify/

โ‡ฑ detoxify ยท PyPI


Skip to main content

detoxify 0.5.2

pip install detoxify

Latest release

Released:

A python library for detecting toxic comments

Navigation

Unverified details

These details have not been verified by PyPI
Project links
Meta
  • License: Apache Software License
  • Author: Unitary
  • Requires: Python >=3.7
Classifiers

Project description

๐Ÿ™Š Detoxify

Toxic Comment Classification with โšก Pytorch Lightning and ๐Ÿค— Transformers

๐Ÿ‘ PyPI version
๐Ÿ‘ GitHub all releases
๐Ÿ‘ CI testing
๐Ÿ‘ Lint

๐Ÿ‘ Examples image

News & Updates

22-10-2021: New improved multilingual model & standardised class names

  • Updated the multilingual model weights used by Detoxify with a model trained on the translated data from the 2nd Jigsaw challenge (as well as the 1st). This model has also been trained to minimise bias and now returns the same categories as the unbiased model. New best AUC score on the test set: 92.11 (89.71 before).
  • All detoxify models now return consistent class names (e.g. "identity_attack" replaces "identity_hate" in the original model to match the unbiased classes).

03-09-2021: New improved unbiased model

  • Updated the unbiased model weights used by Detoxify with a model trained on both datasets from the first 2 Jigsaw challenges. New best score on the test set: 93.74 (93.64 before).

15-02-2021: Detoxify featured in Scientific American!

14-01-2021: Lightweight models

  • Added smaller models trained with Albert for the original and unbiased models! Can access these in the same way with detoxify using original-small and unbiased-small as inputs. The original-small achieved a mean AUC score of 98.28 (98.64 before) and the unbiased-small achieved a final score of 93.36 (93.64 before).

Description

Trained models & code to predict toxic comments on 3 Jigsaw challenges: Toxic comment classification, Unintended Bias in Toxic comments, Multilingual toxic comment classification.

Built by Laura Hanu at Unitary, where we are working to stop harmful content online by interpreting visual content in context.

Dependencies:

  • For inference:
    • ๐Ÿค— Transformers
    • โšก Pytorch lightning
  • For training will also need:
    • Kaggle API (to download data)
Challenge Year Goal Original Data Source Detoxify Model Name Top Kaggle Leaderboard Score % Detoxify Score %
Toxic Comment Classification Challenge 2018 build a multi-headed model thatโ€™s capable of detecting different types of of toxicity like threats, obscenity, insults, and identity-based hate. Wikipedia Comments original 98.86 98.64
Jigsaw Unintended Bias in Toxicity Classification 2019 build a model that recognizes toxicity and minimizes this type of unintended bias with respect to mentions of identities. You'll be using a dataset labeled for identity mentions and optimizing a metric designed to measure unintended bias. Civil Comments unbiased 94.73 93.74
Jigsaw Multilingual Toxic Comment Classification 2020 build effective multilingual models Wikipedia Comments + Civil Comments multilingual 95.36 92.11

It is also noteworthy to mention that the top leadearboard scores have been achieved using model ensembles. The purpose of this library was to build something user-friendly and straightforward to use.

Multilingual model language breakdown

Language Subgroup Subgroup size Subgroup AUC Score %
๐Ÿ‡ฎ๐Ÿ‡น it 8494 89.18
๐Ÿ‡ซ๐Ÿ‡ท fr 10920 89.61
๐Ÿ‡ท๐Ÿ‡บ ru 10948 89.81
๐Ÿ‡ต๐Ÿ‡น pt 11012 91.00
๐Ÿ‡ช๐Ÿ‡ธ es 8438 92.74
๐Ÿ‡น๐Ÿ‡ท tr 14000 97.19

Limitations and ethical considerations

If words that are associated with swearing, insults or profanity are present in a comment, it is likely that it will be classified as toxic, regardless of the tone or the intent of the author e.g. humorous/self-deprecating. This could present some biases towards already vulnerable minority groups.

The intended use of this library is for research purposes, fine-tuning on carefully constructed datasets that reflect real world demographics and/or to aid content moderators in flagging out harmful content quicker.

Some useful resources about the risk of different biases in toxicity or hate speech detection are:

Quick prediction

The multilingual model has been trained on 7 different languages so it should only be tested on: english, french, spanish, italian, portuguese, turkish or russian.

# install detoxify

pipinstalldetoxify
from detoxify import Detoxify

# each model takes in either a string or a list of strings

results = Detoxify('original').predict('example text')

results = Detoxify('unbiased').predict(['example text 1','example text 2'])

results = Detoxify('multilingual').predict(['example text','exemple de texte','texto de ejemplo','testo di esempio','texto de exemplo','รถrnek metin','ะฟั€ะธะผะตั€ ั‚ะตะบัั‚ะฐ'])

# to specify the device the model will be allocated on (defaults to cpu), accepts any torch.device input

model = Detoxify('original', device='cuda')

# optional to display results nicely (will need to pip install pandas)

import pandas as pd

print(pd.DataFrame(results, index=input_text).round(5))

For more details check the Prediction section.

Labels

All challenges have a toxicity label. The toxicity labels represent the aggregate ratings of up to 10 annotators according the following schema:

  • Very Toxic (a very hateful, aggressive, or disrespectful comment that is very likely to make you leave a discussion or give up on sharing your perspective)
  • Toxic (a rude, disrespectful, or unreasonable comment that is somewhat likely to make you leave a discussion or give up on sharing your perspective)
  • Hard to Say
  • Not Toxic

More information about the labelling schema can be found here.

Toxic Comment Classification Challenge

This challenge includes the following labels:

  • toxic
  • severe_toxic
  • obscene
  • threat
  • insult
  • identity_hate

Jigsaw Unintended Bias in Toxicity Classification

This challenge has 2 types of labels: the main toxicity labels and some additional identity labels that represent the identities mentioned in the comments.

Only identities with more than 500 examples in the test set (combined public and private) are included during training as additional labels and in the evaluation calculation.

  • toxicity
  • severe_toxicity
  • obscene
  • threat
  • insult
  • identity_attack
  • sexual_explicit

Identity labels used:

  • male
  • female
  • homosexual_gay_or_lesbian
  • christian
  • jewish
  • muslim
  • black
  • white
  • psychiatric_or_mental_illness

A complete list of all the identity labels available can be found here.

Jigsaw Multilingual Toxic Comment Classification

Since this challenge combines the data from the previous 2 challenges, it includes all labels from above, however the final evaluation is only on:

  • toxicity

How to run

First, install dependencies

# clone project

gitclonehttps://github.com/unitaryai/detoxify

# create virtual env

python3-mvenvtoxic-env
sourcetoxic-env/bin/activate

# install project

pipinstall-edetoxify
cddetoxify

# for training
pipinstall-rrequirements.txt

Prediction

Trained models summary:

Model name Transformer type Data from
original bert-base-uncased Toxic Comment Classification Challenge
unbiased roberta-base Unintended Bias in Toxicity Classification
multilingual xlm-roberta-base Multilingual Toxic Comment Classification

For a quick prediction can run the example script on a comment directly or from a txt containing a list of comments.

# load model via torch.hub

pythonrun_prediction.py--input'example'--model_nameoriginal

# load model from from checkpoint path

pythonrun_prediction.py--input'example'--from_ckpt_pathmodel_path

# save results to a .csv file

pythonrun_prediction.py--inputtest_set.txt--model_nameoriginal--save_toresults.csv

# to see usage

pythonrun_prediction.py--help

Checkpoints can be downloaded from the latest release or via the Pytorch hub API with the following names:

  • toxic_bert
  • unbiased_toxic_roberta
  • multilingual_toxic_xlm_r
model=torch.hub.load('unitaryai/detoxify','toxic_bert')

Importing detoxify in python:

from detoxify import Detoxify

results = Detoxify('original').predict('some text')

results = Detoxify('unbiased').predict(['example text 1','example text 2'])

results = Detoxify('multilingual').predict(['example text','exemple de texte','texto de ejemplo','testo di esempio','texto de exemplo','รถrnek metin','ะฟั€ะธะผะตั€ ั‚ะตะบัั‚ะฐ'])

# to display results nicely

import pandas as pd

print(pd.DataFrame(results,index=input_text).round(5))

Training

If you do not already have a Kaggle account:

  • you need to create one to be able to download the data

  • go to My Account and click on Create New API Token - this will download a kaggle.json file

  • make sure this file is located in ~/.kaggle

# create data directory

mkdirjigsaw_data
cdjigsaw_data

# download data

kagglecompetitionsdownload-cjigsaw-toxic-comment-classification-challenge

kagglecompetitionsdownload-cjigsaw-unintended-bias-in-toxicity-classification

kagglecompetitionsdownload-cjigsaw-multilingual-toxic-comment-classification

Start Training

Toxic Comment Classification Challenge

# combine test.csv and test_labels.csv
pythonpreprocessing_utils.py--test_csvjigsaw_data/jigsaw-toxic-comment-classification-challenge/test.csv--update_test

pythontrain.py--configconfigs/Toxic_comment_classification_BERT.json

Unintended Bias in Toxicicity Challenge

pythontrain.py--configconfigs/Unintended_bias_toxic_comment_classification_RoBERTa_combined.json

Multilingual Toxic Comment Classification

The translated data (source 1 source 2) can be downloaded from Kaggle in french, spanish, italian, portuguese, turkish, and russian (the languages available in the test set).

# combine test.csv and test_labels.csv
pythonpreprocessing_utils.py--test_csvjigsaw_data/jigsaw-multilingual-toxic-comment-classification/test.csv--update_test

pythontrain.py--configconfigs/Multilingual_toxic_comment_classification_XLMR.json

Monitor progress with tensorboard

tensorboard--logdir=./saved

Model Evaluation

Toxic Comment Classification Challenge

This challenge is evaluated on the mean AUC score of all the labels.

pythonevaluate.py--checkpointsaved/lightning_logs/checkpoints/example_checkpoint.pth--test_csvtest.csv

Unintended Bias in Toxicicity Challenge

This challenge is evaluated on a novel bias metric that combines different AUC scores to balance overall performance. More information on this metric here.

pythonevaluate.py--checkpointsaved/lightning_logs/checkpoints/example_checkpoint.pth--test_csvtest.csv

# to get the final bias metric
pythonmodel_eval/compute_bias_metric.py

Multilingual Toxic Comment Classification

This challenge is evaluated on the AUC score of the main toxic label.

pythonevaluate.py--checkpointsaved/lightning_logs/checkpoints/example_checkpoint.pth--test_csvtest.csv

Citation

@misc{Detoxify,
 title={Detoxify},
 author={Hanu, Laura and {Unitary team}},
 howpublished={Github. https://github.com/unitaryai/detoxify},
 year={2020}
}

Project details

Unverified details

These details have not been verified by PyPI
Project links
Meta
  • License: Apache Software License
  • Author: Unitary
  • Requires: Python >=3.7
Classifiers

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

detoxify-0.5.2.tar.gz (13.8 kB view details)

Uploaded Source

Built Distribution

Filter files by name, interpreter, ABI, and platform.

If you're not sure about the file name format, learn more about wheel file names.

Copy a direct link to the current filters

detoxify-0.5.2-py3-none-any.whl (12.1 kB view details)

Uploaded Python 3

File details

Details for the file detoxify-0.5.2.tar.gz.

File metadata

  • Download URL: detoxify-0.5.2.tar.gz
  • Upload date:
  • Size: 13.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for detoxify-0.5.2.tar.gz
Algorithm Hash digest
SHA256 c119d0b47545bb076190ff583faad9aa3bb0a90ac01c7c4144758606d21da5b1
MD5 61b774b573b1896d7a2e6e4409e58ed3
BLAKE2b-256 ab1f3b0a6cba11a0e9e2530d6021bb2f723b2c5b6653244e6eda47df80356acc

See more details on using hashes here.

File details

Details for the file detoxify-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: detoxify-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 12.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for detoxify-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e6135a2d85ad17bfb19d8cf53756a1b499e6d9c03cb1976ddefdc013bf29d32a
MD5 79fe042e2f113b7ca5a108624e7902fe
BLAKE2b-256 1f5c94f765f08fe52473bc945284b7912032c099d0737cb8ad9235deb3b40d35

See more details on using hashes here.

Supported by

๐Ÿ‘ Image
AWS Cloud computing and Security Sponsor ๐Ÿ‘ Image
Datadog Monitoring ๐Ÿ‘ Image
Depot Continuous Integration ๐Ÿ‘ Image
Fastly CDN ๐Ÿ‘ Image
Google Download Analytics ๐Ÿ‘ Image
Pingdom Monitoring ๐Ÿ‘ Image
Sentry Error logging ๐Ÿ‘ Image
StatusPage Status page