![]() |
VOOZH | about |
Emotion Detection is one of the hottest topics in research nowadays. Emotion-sensing technology can facilitate communication between machines and humans. It will also help to improve the decision-making process. Many Machine Learning Models have been proposed to recognize emotions from the text. But, in this article, our focus is on the Bidirectional LSTM Model. Bidirectional LSTMs in short BiLSTM are an addition to regular LSTMs which are used to enhance the performance of the model on sequence classification problems. BiLSTMs use two LSTMs to train on sequential input. The first LSTM is used on the input sequence as it is. The second LSTM is used on a reversed representation of the input sequence. It helps in supplementing additional context and makes our model fast.
The dataset we have used is ISEAR (The International Survey on Emotion Antecedents and Reactions). Here, is a glimpse of the dataset.
ISEAR dataset contains 7652 sentences. It has a total of seven sentiments- Joy, Fear, Anger, Sadness, Guilt, Shame, and Disgust.
Let's go step by step in making the model which will predict emotion.
The next step is to load the dataset from our machine and preprocess it. In the dataset, there are some rows that contain -'No response'. This sentence is completely useless for us. So, we will drop such rows.
To get the dataset and code, click
Output:
0 1 0 joy [ On days when I feel close to my partner and ... 1 fear Every time I imagine that someone I love or I ... 2 anger When I had been obviously unjustly treated and... 3 sadness When I think about the short time that we live... 4 disgust At a gathering I found myself involuntarily si...
Apply a word tokenizer to convert each sentence into a list of words. Example: If there is a sentence- 'I am happy'. Afterward tokenizing it will get converted into a list ['I','am', 'happy'].
Output:
['[', 'On', 'days', 'when', 'I', 'feel', 'close', 'to', 'my', 'partner', 'and', 'other', 'friends', '.', 'When', 'I', 'feel', 'at', 'peace', 'with', 'myself', 'and', 'also', 'experience', 'a', 'close', 'contact', 'with', 'people', 'whom', 'I', 'regard', 'greatly', '.', ']']
The length of each sentence is different. To pass it through the model, the length of each sentence should be equal. By visualizing the dataset, we can see that the length of the sentence in the dataset is not greater than 100 words. So, now we will convert every sentence to 100 words. For this, we will take the help of padding.
Output:
['[', 'On', 'days', 'when', 'I', 'feel', 'close', 'to', 'my', 'partner', 'and', 'other', 'friends', '.', 'When', 'I', 'feel', 'at', 'peace', 'with', 'myself', 'and', 'also', 'experience', 'a', 'close', 'contact', 'with', 'people', 'whom', 'I', 'regard', 'greatly', '.', ']', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>', '<pad>']
Now, each word needs to be embedded in some numeric representation, as the model understands only numeric digits. So, for this, we have downloaded a predefined glove vector of 50 dimensions from the internet. This vector is used for the purpose of word embedding. Each word is represented in a vector of 50 dimensions.
The glove vector contains almost all words in the English dictionary.
The first word of each row is the character that is to be embedded. And from the column to the last column, there is the numeric representation of that character in a 50d vector form.
Create embeddings_index dictionary of words and their corresponding index
Output:
array([ 0.092086, 0.2571 , -0.58693 , -0.37029 , 1.0828 , -0.55466 , -0.78142 , 0.58696 , -0.58714 , 0.46318 , -0.11267 , 0.2606 , -0.26928 , -0.072466, 1.247 , 0.30571 , 0.56731 , 0.30509 , -0.050312, -0.64443 , -0.54513 , 0.86429 , 0.20914 , 0.56334 , 1.1228 , -1.0516 , -0.78105 , 0.29656 , 0.7261 , -0.61392 , 2.4225 , 1.0142 , -0.17753 , 0.4147 , -0.12966 , -0.47064 , 0.3807 , 0.16309 , -0.323 , -0.77899 , -0.42473 , -0.30826 , -0.42242 , 0.055069, 0.38267 , 0.037415, -0.4302 , -0.39442 , 0.10511 , 0.87286 ], dtype=float32)
Now, each word of the dataset should be embedded in 50 dimensions vector with the help of the dictionary form above.
Output:
[-0.61201 0.98226 0.11539 0.014623 0.23873 -0.067035 0.30632 -0.64742 -0.38517 -0.03691 0.094788 0.57631 -0.091557 -0.54825 0.25255 -0.14759 0.13023 0.21658 -0.30623 0.30028 -0.23471 -0.17927 0.9518 0.54258 0.31172 -0.51038 -0.65223 -0.48858 0.13486 -0.40132 2.493 -0.38777 -0.26456 -0.49414 -0.3871 -0.20983 0.82941 -0.46253 0.39549 0.014881 0.79485 -0.79958 -0.16243 0.013862 -0.53536 0.52536 0.019818 -0.16353 0.30649 0.81745 ]
Here, in the above example, the dictionary formed i.e embeddings_index contains the word and its corresponding 50d vector,
Now, we are done with all the preprocessing parts, and now we need to perform the following operations:
Output:
Epoch 1/20 95/95 [==============================] - 13s 99ms/step - loss: 1.8433 - accuracy: 0.2545 Epoch 2/20 95/95 [==============================] - 11s 118ms/step - loss: 1.6367 - accuracy: 0.3751 Epoch 3/20 95/95 [==============================] - 10s 103ms/step - loss: 1.5475 - accuracy: 0.4102 Epoch 4/20 95/95 [==============================] - 9s 98ms/step - loss: 1.4917 - accuracy: 0.4404 Epoch 5/20 95/95 [==============================] - 11s 112ms/step - loss: 1.4453 - accuracy: 0.4531 Epoch 6/20 95/95 [==============================] - 13s 133ms/step - loss: 1.4109 - accuracy: 0.4777 Epoch 7/20 95/95 [==============================] - 12s 129ms/step - loss: 1.3521 - accuracy: 0.4934 Epoch 8/20 95/95 [==============================] - 10s 100ms/step - loss: 1.2956 - accuracy: 0.5262 Epoch 9/20 95/95 [==============================] - 11s 113ms/step - loss: 1.2651 - accuracy: 0.5360 Epoch 10/20 95/95 [==============================] - 11s 110ms/step - loss: 1.2126 - accuracy: 0.5602 Epoch 11/20 95/95 [==============================] - 10s 101ms/step - loss: 1.1970 - accuracy: 0.5719 Epoch 12/20 95/95 [==============================] - 10s 105ms/step - loss: 1.1226 - accuracy: 0.5942 Epoch 13/20 95/95 [==============================] - 9s 91ms/step - loss: 1.0961 - accuracy: 0.6084 Epoch 14/20 95/95 [==============================] - 8s 88ms/step - loss: 1.0746 - accuracy: 0.6168 Epoch 15/20 95/95 [==============================] - 9s 97ms/step - loss: 1.0081 - accuracy: 0.6376 Epoch 16/20 95/95 [==============================] - 10s 102ms/step - loss: 0.9750 - accuracy: 0.6502 Epoch 17/20 95/95 [==============================] - 9s 97ms/step - loss: 0.9394 - accuracy: 0.6683 Epoch 18/20 95/95 [==============================] - 10s 103ms/step - loss: 0.9235 - accuracy: 0.6693 Epoch 19/20 95/95 [==============================] - 11s 112ms/step - loss: 0.8728 - accuracy: 0.6936 Epoch 20/20 95/95 [==============================] - 10s 102ms/step - loss: 0.8256 - accuracy: 0.7079
Output:
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= bidirectional (Bidirectiona (None, 200) 120800 l) dropout (Dropout) (None, 200) 0 dense (Dense) (None, 7) 1407 ================================================================= Total params: 122,207 Trainable params: 122,207 Non-trainable params: 0
This is the diagram of the proposed model :
Here, the dimension of input is 100 X 50 where 100 is the number of words in each input sentence of the dataset and 50 represents the mapping of each word in a 50d vector.
The output of Bidirectional(LSTM) is 200 because above we have defined the dimensionality of output space to be 100. As it is a BiLSTM model, so dimensionality will be 100*2 =200, as a BiLSTM contains two LSTM layers- one forward and the other backward.
After this dropout layer is added to prevent overfitting. And at last dense layer is applied to convert the 200 output sequences to 7, as we have only 7 emotions, so the output should be of seven dimensions only.
Testing the model
Output:
1515/1515 [==============================] - 11s 7ms/step - loss: 1.4335 - accuracy: 0.5109 [1.4335393905639648, 0.5108910799026489]
The accuracy of the model is 51%