Toxic Comment Classification using BERT

Last Updated : 28 Apr, 2025

Social media users frequently encounter abuse, harassment, and insults from other users on a majority of online communication platforms like Facebook, Instagram and Youtube due to which many users stop expressing their ideas and opinions.

What is the solution?

The solution to this problem is to create an effective model that can identify the level of toxicity in comments such as threats, obscenity, insults, racism, etc. Thereby, promoting a peaceful environment for online dialogue.

In this article, we will understand more about Toxic comment multi-label classification and create a model to classify comments into various labels of toxicity.

What is Toxic comment classification?

The toxicity class refers to any comment or text containing offensive or hurtful words. This can involve insults, slurs or other offensive language.

Every supervised classification technique can be further subdivided into three groups based on the number of categories it uses:

1. Binary classification:

It is a type of supervised machine-learning problem that classifies data into two mutually exclusive groups or categories. The two categories can be classified as true and false, 0 and 1, positive and negative, etc.

In toxic comment classification, the model is trained to predict whether a comment is toxic (class 1) or non-toxic (class 0).

Example:

"I hate you!" Predicted class: Toxic (class 1)

"I like you!" Predicted class: Non-toxic (class 0)

2. Multiclass classification:

It is a type of supervised machine-learning problem that classifies data into three or more groups/categories.

A multiclass classifier for Toxic comment classification is trained to detect various degrees of toxicity in comments, such as mild toxicity, severe toxicity, and non-toxic comments, as opposed to just differentiating between toxic and non-toxic comments (binary classification).

Example:

"I want to kill you!" Predicted class: Severe toxicity

"You are so ugly and unconfident" Predicted class: Mild toxicity

"You are a good person" Predicted class: Non-toxic

3. Multilabel classification: Multilabel classification is a supervised machine learning approach where a single instance can be associated with multiple labels simultaneously. It allows the model to assign zero, one, or more labels to each data sample based on its characteristics.

In the context of toxic comment classification, a comment or text can be labelled with multiple toxicity categories if it contains various forms of harmful language.

Example:

"You're an idiot person, and I hope someone hits you!"

Multiple Labels: Offensive language (class 1), Threats (class 1), hatred (class1), non_toxic(class 0)

Toxic Comment Classification using BERT

Let's get started!

About the dataset:

We have a large number of Wikipedia comments which have been labelled by human raters for toxic behaviour. The dataset variables are:

toxic
severe_toxic
obscene
threat
insult
identity_hate

Access the dataset: Toxic Comments dataset

Now, the coding part begins!

Prerequisite

Utilizing PyTorch with transformers, for a more flexible and intuitive interface for building and training deep learning models

!pip install torch

Transformers for using BERT(Bidirectional Encoder Representations from Transformers)

!pip install transformers

Importing necessary libraries

Load the datasets

Output:

 id comment_text toxic \
0 0000997932d777bf Explanation\nWhy the edits made under my usern... 0 
1 000103f0d9cfb60f D'aww! He matches this background colour I'm s... 0 
2 000113f07ec002fd Hey man, I'm really not trying to edit war. It... 0 
3 0001b41b1c6bb37e "\nMore\nI can't make any real suggestions on ... 0 
4 0001d958c54c6e35 You, sir, are my hero. Any chance you remember... 0 
 severe_toxic obscene threat insult identity_hate 
0 0 0 0 0 0 
1 0 0 0 0 0 
2 0 0 0 0 0 
3 0 0 0 0 0 
4 0 0 0 0 0

Data Visualization to Understand Class Distribution

Output:

👁 Distribution of Label Occurrences-Geeksforgeeks

Checking exact values for each class

Output:

threat 478
identity_hate 1405
severe_toxic 1595
insult 7877
obscene 8449
toxic 15294
dtype: int64

Toxic and Non-Toxic Data

Let's check if the data is balanced or not by comparing toxic and clean comments by creating their subsets, and then create a new data frame to visualize and gain insights on the distribution of the dataset.

Output:

👁 Distribution of Toxic and Clean Comments-Geeksforgeeks

We can observe that our dataset is severely imbalanced.

Let's have a look at the proportion of toxic and clean comments in numbers in order to know the exact numbers and balance the data accordingly.

Output:

(16225, 8)
(143346, 8)

There is a huge difference in the dataset between toxic and clean comments.

Handling class imbalance

To handle the imbalanced data, we can create a new training set in which the number of toxic comments remains the same, and to match that, we will randomly sample 16,225 clean comments and include them in the training set.

The new balanced data frame

let's verify with actual figures

Output:

(16225, 8)
(16225, 8)
(32450, 8)

Now, the dataset is balanced with exactly equal instances of toxic and clean comments we can proceed further to tokenizing and encoding comments using BertTokenizer.

Split Data into Training, Validation, and Testing Sets

In this step, we split the data into training, validation, and testing sets. The data is divided into training and testing sets first, and then the testing set is further split into validation and testing sets.

Now, we split the validation set

Now, we will tokenize and encode the comments and labels for the training, testing, and validation sets.

Tokenization and Encoding

Defining 'tokenize_and_encode' function to perform this task

Initialize Tokenizer and Model

Now, we will Initialize the BERT tokenizer with the 'bert-base-uncased' model

Initialize BERT classification Model

After this step, we will initialize the BERT model for sequence classification

Now, an additional step for faster processing of the model. You can move the model to the GPU if available, or to the CPU if not.

Apply Tokenization and Encoding

Tokenize and Encode the comments and labels of the train, test and validation set

Output:

Training Comments : (22715,)
Input Ids : torch.Size([22715, 128])
Attention Mask : torch.Size([22715, 128])
Labels : torch.Size([22715, 6])

Let's check an encoded text with the corresponding text and labels

Output:

Training Comments -->> I have edited the text and wrote with neutral information. Please suggest what went wrong.

Input Ids -->>
 tensor([ 101, 1045, 2031, 5493, 1996, 3793, 1998, 2626, 2007, 8699, 2592, 1012,
 3531, 6592, 2054, 2253, 3308, 1012, 102, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0])

Decoded Ids -->>
 [CLS] i have edited the text and wrote with neutral information. please suggest what went wrong. [SEP]
 [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] 
[PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] 
[PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] 
[PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] 
[PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD]
 [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] 
[PAD] [PAD]

Attention Mask -->>
 tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0])

Labels -->> tensor([0., 0., 0., 0., 0., 0.])

Creating Pytorch Data Loaders

Now, we will create data loaders to efficiently load the data during training, testing, and validation. The data loaders batch the input data and handle shuffling for the training data.

Let's check the train_loader data

Output:

Batch Size : 32
Each Input ids shape : torch.Size([32, 128])
Input ids :
 tensor([ 101, 2175, 3280, 1999, 1037, 2543, 1012, 1045, 2123, 2102,
 2228, 3087, 2106, 2062, 4053, 2000, 16948, 2059, 2017, 1999,
 1996, 2197, 2048, 2086, 1012, 9119, 1010, 3246, 2017, 2123,
 2102, 2272, 2067, 2007, 1037, 28407, 13997, 1006, 2029, 2017,
 2471, 5121, 2097, 999, 999, 999, 1007, 6109, 1012, 6564,
 1012, 2382, 1012, 19955, 102, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0])
Corresponding Decoded text:
 [CLS] go die in a fire. i dont think anyone did more damage to wikipedia then you in the last two years. goodbye, 
hope you dont come back with a sock puppet ( which you almost certainly will!!! ) 93. 86. 30. 194 [SEP]
 [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] 
[PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] 
[PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] 
[PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD]
 [PAD] [PAD] [PAD] [PAD] [PAD]
Corresponding Attention Mask :
 tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 0, 0, 0, 0, 0, 0, 0, 0])
Corresponding Label: tensor([1., 0., 1., 0., 1., 0.])

Initializes the optimizer for training the model.

AdamW optimizer: We are using AdamW optimizer which refers to Adaptive Moment Estimation. It combines the advantages of RMSprop (Root Mean Square Propagation) and AdaGrad (Adaptive Gradient Algorithm), two additional optimization strategies.

For each model parameter, it includes moving averages of the gradient and the squared gradient, which aid in adjusting the learning rates for various parameters during training.

Model Training

Output:

Epoch 1, Training Loss: 0.20543626952968852,Validation loss:0.1643741050479459
Epoch 2, Training Loss: 0.13793433358971502,Validation loss:0.14861836971021167
Epoch 3, Training Loss: 0.11418234390587034,Validation loss:0.1539663544862099

Model Evaluation

let's evaluate the model now

Output:

Accuracy: 0.7099
Precision: 0.8059
Recall: 0.8691

Now, we can evaluate the model based on the metrics results achieved here.

Save the Model

Now, load the model

Load the Model

Now, comes the interesting part!

Prediction

let's predict user input

Output:

{'toxic': 1,
 'severe_toxic': 0,
 'obscene': 0,
 'threat': 0,
 'insult': 0,
 'identity_hate': 0}

We can observe that the comment 'Are you insane!' is a toxic comment.

let's check for more inputs

Output:

{'toxic': 0,
 'severe_toxic': 0,
 'obscene': 0,
 'threat': 0,
 'insult': 0,
 'identity_hate': 0}

Well, obviously the comment 'How are you?' is not toxic, hence all the other label values are 0

Output:

{'toxic': 1,
 'severe_toxic': 0,
 'obscene': 1,
 'threat': 0,
 'insult': 1,
 'identity_hate': 0}

As we can see, the comment "Such an Idiot person" shows true for labels toxic, obscene and insult which is right. It is definitely not a threat or identity threat so those values come out to be 0.

Comment

Article Tags:

Machine Learning

AI-ML-DS

Natural-language-processing

Deep-Learning

Explore

Machine Learning Basics

Python for Machine Learning

Feature Engineering

Supervised Learning

Unsupervised Learning

Model Evaluation and Tuning

Advanced Techniques

Machine Learning Practice

Courses

URL: https://www.geeksforgeeks.org/machine-learning/toxic-comment-classification-using-bert/

⇱ Toxic Comment Classification using BERT - GeeksforGeeks

Toxic Comment Classification using BERT

What is Toxic comment classification?

Toxic Comment Classification using BERT

Prerequisite

Importing necessary libraries

Load the datasets

Data Visualization to Understand Class Distribution

Toxic and Non-Toxic Data

Handling class imbalance

Split Data into Training, Validation, and Testing Sets

Tokenization and Encoding

Initialize Tokenizer and Model

Initialize BERT classification Model

Apply Tokenization and Encoding

Creating Pytorch Data Loaders

Initializes the optimizer for training the model.

Model Training

Model Evaluation

Save the Model

Load the Model

Prediction

Explore