VOOZH about

URL: https://towardsdatascience.com/exploring-emotions-with-artificial-intelligence-openai-and-exploratory-data-analysis-36a3882d3f11/

⇱ Exploring emotions with Artificial Intelligence, OpenAI, and Exploratory Data Analysis | Towards Data Science


Exploring emotions with Artificial Intelligence, OpenAI, and Exploratory Data Analysis

Here's how to visualize emotion in text with Python using OpenAI and Exploratory Data Analysis

10 min read
👁 Image by author made using Midjourney
Image by author made using Midjourney

I want to start by saying that I am a fan of old Disney movies more than the new ones.

I think that it has something to do with the fact that I was a kid when I watched older Disney movies and I have this feeling of nostalgia about the moment itself. Even if, by all means, I am not a movie expert, I also have the feeling that the old Disney movies are the ones with the best plots.

There is one remarkable exception though, and that is Inside Out. I watched that movie in the theater and was in love with it. I don’t want to spoil anything, I’ll say that the movie is about the idea that inside all of us there is a range of emotions:

  • Rage
  • Disgust
  • Joy
  • Fear
  • Sadness

And these emotions sometimes talk, like they are real people, inside of us. It’s an incredibly sweet movie with, I think, an amazing plot. When I heard that the new movie, Inside Out 2, is about to come out, I was so excited and I’m counting the days. 🙂

Now, let me anticipate your question:

"What does it have to do with AI?

When I was watching the trailer, I thought

"Can we generate what emotions would say?" "Can we use the ChatGPT technologies to simulate what emotion "rage" would say if they were a person?" "Can we explore emotions as vectors in a N dimensional space? And if that, can we plot it?

So buckle up, and let’s see what we can do 😏

Before we dive in, I want to highlight that this is a "game" based on the movie. I think that a world where emotions are replaced by AI would be a pretty sad one, and I’m not claiming that the procedure will do any of that in the first place.

This story is divided into 4 parts:

  1. The structure of the code
  2. The first part of the code, generating emotions. High focus on the code
  3. The second part of the code. This is less actually based on the code. It is more based on the Exploration of the dataset. This is where the fun starts!
  4. A summary
  5. The conclusions

This blog will be code-based and it is meant to be useful for developers. If you are not a developer, you can jump straight to chapter 3, where I will describe the result of the Emotion inspection using AI 🙂

0. The structure

This work is divided into two chapters:

  1. The generation of the "emotion" data: It will be a module with a main.py script that generates sentences. These sentences will be "movie-like" sentences like the Inside Out ones. We will generate 150 sentences per emotion: 150×5 = 750 sentences with OpenAI.
  2. Exploratory Data Analysis: This will be a notebook that will do the data exploration on the Open AI generated sentences.

Let’s describe how we are doing it…

The whole code is available on my GitHub page (https://github.com/PieroPaialungaAI/Emotion_AI/)

1. Emotion AI

We call the generation of the emotion data as EmotionAI.

In EmotionAI, I created 4 Python scripts:

  • constants.py
  • main.py
  • util.py
  • emotionai.py

You can find that in this Github folder. Let’s start from the bottom, by describing the main.py file:

1.1 Main.py

The main.py file has the idea of creating the emotions’ sentences. It does it using a class named AIEmotionGenerator, which is part of the emotionai.py code. Using AIEmotionGenerator, you first create the folders where you will store the sentences (50×3 per emotion), and then you generate the sentences and store them. Pretty easy, right?

The dirty work is done by emotionai.py let’s give a look:

1.2 emotionai.py

emotionai.py builds a class, named AIEmotionGenerator.

This class does two things:

  • Build the folder for you (boring) 🥱
  • Generate the sentence given an emotion (very interesting) 🤩

I generated emotions giving a hint to OpenAI’s GPT, which is the real sentences said by the emotions of the movie Inside Out. Let’s see what they are in constants.py

1.3 constants.py

So, this is the constants.py. In this file we have the sentences that we use to make GPT understand more about the task. You should fill it with your Open AI API key too… this is crucial to make it work*.

*The OpenAI API key will cost you money eventually, but I spent like $0.11 for this project. Keep an eye on the usage page of OpenAI, but don’t be stressed about it. It’s not an investment. 🙂

The sentences come from a movie. Notice that OpenAI is programmed not to give you answers that are ANGRY, or SAD, that’s why I changed the emotion to "Goofy Sadness" and "Funny Anger"… It’s a little bit frustrating that these technologies pretend that the world is all rainbow and ice cream if you ask me, but it did the trick.

2. EDA on the Emotion Sentences

Ok, so after you have generated the emotions, you will have your folders and they will look like this:

👁 Image by author
Image by author

We need to extract the .txt files and explore them. Let’s do it step by step:

2.0 Libraries

I used my long life friends: 👻

Notice that EmotionAI is in the library, we will need to use it (especially for the constants)

2.1 Importing the text

This part is a little bit boring but necessary, we import the text from the folders and fix it in a dataframe manner.

Unfortunately, the OpenAI response can be a little random, especially in the format. So please review the .txt files and delete stuff like "Sure! Let me give you the answer" or "I’m sorry that you feel sad, this is the list of answers"…

Once you have done it your dataset will look like this:

2.2 Visualization: WordCloud!

The wordcloud is a method to see the frequency of words in a text. This is what we will use, class by class (or emotion by emotion) to visualize the texts:

👁 Image

This is super fun. The disgust has the most said word "SMELL" and "SOCK". The Joy has world like "LAUGHTER" and "DELIGHT". The Sadness has words like "STUCK", "ACCIDENTALLY", "NEVER". Fear has words like "SCARED", "MIGHT", WORRY".

We are getting good stuff. Let’s keep going!

2.3 Embedding

Embedding is the idea of transforming words into numbers… or better… SENTENCES into SEQUENCES of NUMBERS. Like this!

👁 Image

The sentences with Awful and Terrible are close, the sentence with Beautiful is further away!

You can do the embedding step in so many ways, I chose to do it with OpenAI, mainly cause it’s quick and efficient, but choose your own method!

This is how you embed each sentence in your dataset, so that you have every sentence as a vector, just like this:

👁 Image

P.S. I’d be mad too if you replaced my Ice Cream with Broccoli…

2.4 Reduce the vectors dimensions…

Now we want to visualize the vectors. To do it, as our brain can only process 2 or 3D things, let’s us the PCA to reduce the dimensionality from 1500+ (vector dimensionality using GPT Embedding) to 2*.

  • Preprocess the input vectors:
  • Train the PCA:
  • Apply the PCA to the original vector

*I made this article about Principal Component Analysis if you want to understand how it works… 🙂

2.5 See the rainbow!

Now that we have our 2D dataset we can plot them and understand what’s going on. Let’s do it!

That’s pretty nice… you can distinguish…

  • Sadness in the lower part of the dataset, as a long stripe
  • Disgust in the right upper part of the dataset, as a diagonal stripe
  • Anger in the left upper part of the dataset, as a circle like structure

Let’s see how "Joy" is doing.

Fear and Joy are pretty separated, which is good, except for an area in the middle. I suspect that is because "Fear" can be "Joyful" when you see sentence like:

"I am so HAPPY that the spider didn’t approach me. I am terrified by spider"

That is like a Joy/Fear situation indeed 🙃

In this case, we also applied the PCA to the embedded Sentence of Fear in the movie, and we see that the sentence that Fear says in the movie (which was):

"I sure am glad you told me earthquakes are a myth Joy, otherwise i’d be terrified right now."

Is pretty much between the orange fellas of the "Fear" sentences.

If we do the same with the disgust sentence, we see that the disgust is correctly in between the green points, which is what we want.

The disgust sentence was:

"Okay, Caution, There Is A Dangerous Smell, People."

And even this one is kind of a Disgust/Fear situation. That is because emotions are not black and white, and you can feel a lot of emotions at the same time. If you were ever in love, you know this very well ❤

3. Conclusions

Thank you so much for reading this story. I had so much fun doing it. I love Inside Out and I think it’s very fun when we try to see if a computer can generate emotion-like sentences. In this story we:

  • Establish that Inside Out is amazing, and we used it as a starting point for the idea of making emotion "say things". If "fear" could talk, what would it say?
  • Used the GPT technology of OpenAI to generate sentences, based on the original ones of the movies. We made OpenAI generate 150 disgust-like sentences, 150 fear-like sentences, 150 joy-like sentences, and 150 sadness-like sentences.
  • Used Exploratory Data Analysis techniques like wordcloud, embedding, and PCA to visualize the results.

The sentences that GPT took out were pretty funny. Fear said:

"I am scared that my own shadow might come alive and start chasing me."

While Sadness said:

"I once got stuck in a revolving door because I forgot to push"

The wordcloud was amazing too, as it highlighted as the Sadness has that sense of Fear and uses words like "Never", while the disgust is almost related to the "smell" of things, which is pretty accurate.

When we visualized the sentences, we saw that the sentences formed clear clusters and that the original sentences of the movie were pretty much in the corresponding cluster. This is a sign of consistency and good prompting.

This work could be useful for many things like:

  • Suggesting the movie director new sentences using this technology
  • Understand the consistency of the characters by tracking their vectors
  • Create new emotions and feelings
  • Your ideas (comment on this post)

4. About me!

Thank you again for your time. It means a lot ❤

My name is Piero Paialunga and I’m this guy here:

👁 Image

I am a Ph.D. student at the University of Cincinnati Aerospace Engineering Department. I talk about AI, and Machine Learning in my blog posts and on Linkedin. If you liked the article and want to know more about machine learning and follow my studies you can:

A. Follow me on Linkedin, where I publish all my stories B. Subscribe to my newsletter. It will keep you updated about new stories and give you the chance to text me to receive all the corrections or doubts you may have. C. Become a referred member, so you won’t have any "maximum number of stories for the month" and you can read whatever I (and thousands of other Machine Learning and Data Science top writers) write about the newest technology available.

If you want to ask me questions or start a collaboration, leave a message here:

[email protected]

Ciao ❤️


Written By

Piero Paialunga

Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.

Write for TDS

Related Articles