Twitter Sentiment Analysis of Uncut Gems
Sentiment Analysis of Tweets about Uncut Gems: Removing False Negatives
Analysis of Tweets about Uncut Gems
: Removing False Negatives
Uncut Gems is a crime/thriller film starring Adam Sandler, Julia Fox, Lakeith Stanfield and former NBA player Kevin Garnett. Written and directed by Josh and Benny Safdie, Uncut Gems is a film ten years in the making. The plot follows an NYC jeweler and gambling addict who must retrieve a black opal uncut gem, mined in Ethiopia, in order to sell and pay off his debts. The film is A24’s highest grossing release so far bringing in $40 million at the time of writing.
In my personal experience, leaving aside the rave reviews the film has received from professional movie critics, the movie has been getting mixed reviews from my peers. This inspired me to carry out sentiment analysis on tweets about Uncut Gems. In this post, we will use the python Twitter API wrapper, Tweepy, in order to retrieve tweets about the movie and subsequently perform sentiment analysis on these tweets using another python library called textblob.
Let’s get started!
First, you’ll need to apply for a Twitter developer account:
After your developer account has been approved, you need to create a Twitter application:
The steps for applying for a Twitter developer account and creating a Twitter application are outlined here.
We will be using the free python library tweepy in order to access the Twitter API. Documentation for tweepy can be found here.
- INSTALLATION
First, make sure you have tweepy installed. Open up a command line and type:
pip install tweepy
- IMPORT LIBRARIES
Next, open up your favorite editor and import the tweepy and pandas libraries:
import tweepy
import pandas as pd
- AUTHENTICATION
Next, we need our consumer key and access token:
Notice that the site suggests that you keep your key and token private! Here we define a fake key and token but you should use your real key and token upon creating the Twitter application as shown above:
consumer_key = '5GBi0dCerYpy2jJtkkU3UwqYtgJpRd'
consumer_secret = 'Q88B4BDDAX0dCerYy2jJtkkU3UpwqY'
access_token = 'X0dCerYpwi0dCerYpwy2jJtkkU3U'
access_token_secret = 'kly2pwi0dCerYpjJtdCerYkkU3Um'
The next step is creating an OAuthHandler instance. We pass our consumer key and access token which we defined above:
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
Next, we pass the OAuthHandler instance into the API method:
api = tweepy.API(auth)
- TWITTER API REQUESTS
Next, we initialize lists for fields we are interested in analyzing. For now, we can look at the tweet strings, users, and the time of the tweet. Next, we write a for loop over a tweepy ‘Cursor’ object. Within the ‘Cursor’ object we pass the ‘api.search’ method, set the query string for what we would like to search for and set ‘count’ = 1000 so that we don’t exceed the Twitter rate limit. Here we will search for tweets about ‘Star Wars’. We also use the ‘item()’ method to convert the ‘Cursor’ object into an iterable.
In order to simplify the query, we can remove retweets and only include tweets in English. To get a sense of what this request returns we can print the values being appended to each list as well:
twitter_users = []
tweet_time = []
tweet_string = []
for tweet in tweepy.Cursor(api.search,q='Uncut Gems', count=1000).items(1000):
if (not tweet.retweeted) and ('RT @' not in tweet.text):
if tweet.lang == "en":
twitter_users.append(tweet.user.name)
tweet_time.append(tweet.created_at)
tweet_string.append(tweet.text)
print([tweet.user.name,tweet.created_at,tweet.text])
For reusability, we can wrap it all up in a function that takes the keyword as input. We can also store the results in a dataframe and return the value :
def get_related_tweets(key_word):
twitter_users = []
tweet_time = []
tweet_string = []
for tweet in tweepy.Cursor(api.search,q=key_word, count=1000).items(1000):
if (not tweet.retweeted) and ('RT @' not in tweet.text):
if tweet.lang == "en":
twitter_users.append(tweet.user.name)
tweet_time.append(tweet.created_at)
tweet_string.append(tweet.text)
print([tweet.user.name,tweet.created_at,tweet.text])
df = pd.DataFrame({'name':twitter_users, 'time': tweet_time, 'tweet': tweet_string})
return df
When we can call the function with the keywords, ‘Uncut Gems’:
get_related_tweets('Uncut Gems')
We can also pass in the keywords "Adam Sandler":
get_related_tweets('Adam Sandler')
We can also pass in the keyword "Julia Fox":
get_related_tweets('Julia Fox')
And "Safdie Brothers":
get_related_tweets('Safdie Brothers')
In order to get sentiment scores, we need to import a python package called textblob. The documentation for textblob can be found here. In order to install textblob open a command line and type:
pip install textblob
Next import textblob:
from textblob import TextBlob
We will use the polarity score as our measure for positive or negative sentiment. The polarity score is a float with values from -1 to +1.
For example, if we define a textblob object and pass in the sentence "Uncut Gems is the best!":
sentiment_score = TextBlob("Uncut Gems is the best!").sentiment.polarity
print("Sentiment Polarity Score:", sentiment_score)
We also can try "Adam Sandler is amazing!":
sentiment_score = TextBlob("Adam Sandler is amazing!").sentiment.polarity
print("Sentiment Polarity Score:", sentiment_score)
A flaw I’ve noticed in using textblob is that it puts a heavier weight on the presence of negative words, despite the presence of positive adjectives, which can inflate false negatives. The presence of the word ‘Uncut’ in the movie title significantly reduces the sentiment value. For example, consider the sentiment scores of "This movie is amazing" vs "Uncut Gems is amazing!":
sentiment_score = TextBlob("This movie is amazing").sentiment.polarity
print("Sentiment Polarity Score:", sentiment_score)
sentiment_score = TextBlob("Uncut Gems is amazing!").sentiment.polarity
print("Sentiment Polarity Score:", sentiment_score)
We can see that for "Uncut Gems is amazing!", while the sentiment is still positive, it is significantly lower than the former sentence "This movie is amazing" when they should be close or equal in value. The way we will get around this issue (as a quick fix) is we will remove the word "Uncut" from the tweet and generate sentiment scores from the result.
Let’s get sentiment polarity scores for tweets about "Uncut Gems" and store them in a data frame (before removing the word "Uncut"):
df = get_related_tweets("Tesla Cybertruck")
df['sentiment'] = df['tweet'].apply(lambda tweet: TextBlob(tweet).sentiment.polarity)
print(df.head())
We can also count the number of positive and negative sentiments:
df_pos = df[df['sentiment'] > 0.0]
df_neg = df[df['sentiment'] < 0.0]
print("Number of Positive Tweets", len(df_pos))
print("Number of Negative Tweets", len(df_neg))
As we can see there are significantly more negative tweets about "Uncut Gems" than positive tweets, but again this may be due to the presence of the word "Uncut" in the movie title which may be giving us false negatives.
Let’s modify the dataframe by removing the word "Uncut" from tweets:
df['tweet'] = df['tweet'].str.replace('Uncut', '')
df['tweet'] = df['tweet'].str.replace('uncut', '')
df['tweet'] = df['tweet'].str.replace('UNCUT', '')
df['sentiment'] = df['tweet'].apply(lambda tweet: TextBlob(tweet).sentiment.polarity)
print(df.head())
df_pos = df[df['sentiment'] > 0.0]
df_neg = df[df['sentiment'] < 0.0]
print("Number of Positive Tweets", len(df_pos))
print("Number of Negative Tweets", len(df_neg))
We can see that there are significantly more positive tweets when we remove the word "Uncut".
For code reuse we can wrap it all up in a function:
def get_sentiment(key_word):
df = get_related_tweets(key_word)
df['tweet'] = df['tweet'].str.replace('Uncut', '')
df['tweet'] = df['tweet'].str.replace('uncut', '')
df['tweet'] = df['tweet'].str.replace('UNCUT', '')
df['sentiment'] = df['tweet'].apply(lambda tweet: TextBlob(tweet).sentiment.polarity)
df_pos = df[df['sentiment'] > 0.0]
df_neg = df[df['sentiment'] < 0.0]
print("Number of Positive Tweets about {}".format(key_word), len(df_pos))
print("Number of Negative Tweets about {}".format(key_word), len(df_neg))
If we call this function with "Uncut Gems", we get:
get_sentiment("Uncut Gems")
It would be convenient if we can visualize these results programmatically. Let’s import seaborn and matplotlib and modify our get_sentiment function:
import seaborn as sns
import matplotlib.pyplot as plt
def get_sentiment(key_word):
df = get_related_tweets(key_word)
df['tweet'] = df['tweet'].str.replace('Uncut', '')
df['tweet'] = df['tweet'].str.replace('uncut', '')
df['tweet'] = df['tweet'].str.replace('UNCUT', '')
df['sentiment'] = df['tweet'].apply(lambda tweet: TextBlob(tweet).sentiment.polarity)
df_pos = df[df['sentiment'] > 0.0]
df_neg = df[df['sentiment'] < 0.0]
print("Number of Positive Tweets about {}".format(key_word), len(df_pos))
print("Number of Negative Tweets about {}".format(key_word), len(df_neg))
sns.set()
labels = ['Postive', 'Negative']
heights = [len(df_pos), len(df_neg)]
plt.bar(labels, heights, color = 'navy')
plt.title(key_word)
get_sentiment("Uncut Gems")
We can also call the function with "Adam Sandler":
get_sentiment( "Adam Sandler")
And "Julia Fox":
get_sentiment("Julia Fox")
And "Kevin Garnett":
get_sentiment("Kevin Garnett")
And "Lakeith Stanfield":
get_sentiment("Lakeith Stanfield")
As you can see tweets about "Uncut Gems" and its starring actors have more positive sentiment than negative sentiment.
To recap, in this post we went over how to pull tweets from twitter using the python Twitter API wrapper (Tweepy). We also reviewed the python sentiment analysis package textblob and how we can use it to generate sentiment scores from tweets. Finally, we showed how we can modify tweets by removing the word "Uncut" which artificially deflated sentiment scores. It would be interesting to collect a few days of data to see how sentiment changes with time. Maybe I will save that for a future post!
Thank you for reading. The code from this post is available on GitHub.
If you enjoyed this article you can make a small contribution on my patreon linked here.
Good luck and Happy Machine Learning!
Share This Article
Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.
Write for TDS