How To Perform Sentiment Analysis in Python 3 Using the Natural Language Toolkit (NLTK)

Published on September 27, 2019

👁 How To Perform Sentiment Analysis in Python 3 Using the Natural Language Toolkit (NLTK)

The author selected the Open Internet/Free Speech fund to receive a donation as part of the Write for DOnations program.

Introduction

A large amount of data that is generated today is unstructured, which requires processing to generate insights. Some examples of unstructured data are news articles, posts on social media, and search history. The process of analyzing natural language and making sense out of it falls under the field of Natural Language Processing (NLP). Sentiment analysis is a common NLP task, which involves classifying texts or parts of texts into a pre-defined sentiment. You will use the Natural Language Toolkit (NLTK), a commonly used NLP library in Python, to analyze textual data.

In this tutorial, you will prepare a dataset of sample tweets from the NLTK package for NLP with different data cleaning methods. Once the dataset is ready for processing, you will train a model on pre-classified tweets and use the model to classify the sample tweets into negative and positives sentiments.

This article assumes that you are familiar with the basics of Python (see our How To Code in Python 3 series), primarily the use of data structures, classes, and methods. The tutorial assumes that you have no background in NLP and nltk, although some knowledge on it is an added advantage.

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

About the author(s)

👁 Shaumik Daityari

Shaumik Daityari

Author

See author profile

Shaumik is an optimist, but one who carries an umbrella. An undergrad at IITR, he loves writing, when he's not busy keeping the blue flag flying high.

See author profile

👁 Haley Mills

Haley Mills

Editor

Category:

Tags:

Still looking for an answer?

Ask a question Search for more help

Was this helpful?

This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

👁 Hilko Kriel

Hilko Kriel

December 11, 2019

Hi Shaumik,

Thank you very much for this brilliant tutorial. I’m in the process of developing a few custom tools for Alteryx and this tutorial was absolutely legendary!!

👁 mantaskantautas

mantaskantautas

February 26, 2020

Really interesting read, I wonder about the speed though. Would having this hosted as a service as an API endpoint on lambda or cloud functions make the speed of feedback somewhat usable in real-world scenarios or you have any other tips on the matter?

👁 mantaskantautas

mantaskantautas

February 26, 2020

So how can we alter the logic, so you would only need to do all then training part only once - as it takes a lot of time and resources. And in real life scenarios most of the time only the custom sentence will be changing.

👁 Martin Breuss

Martin Breuss

March 17, 2020

I think there’s a slice too much in this example:

tweet_tokens = twitter_samples.tokenized('positive_tweets.json')[0]

print(tweet_tokens[0])

Seems to me you wanted to show a single example tweet, so makes sense to keep the [0] in your print() function, but remove it from the line above. Otherwise tweet_tokens becomes less useful.

👁 toddrimes

toddrimes

May 2, 2020

Hi, Shaumik:

In the final code, what is

text = twitter_samples.strings('tweets.20150430-223406.json')

…for? It looks like ‘text’ is never referenced or used after that.

Thank you, ~Todd

👁 gmehta1996

gmehta1996

July 3, 2020

I tried the sentiment analysis with the positive and negative tweets but I want to add more sentiments to it like sarcasm or neutral. I tried to add 5000 neutral tweets and followed the same procedure like positive and negative. If I do so can I get the ratio of all the three sentiments when I use the ‘classifier.show_most_informative_features(10)’ command . Currently I am getting ratios of neutral with either only positive or negative

following is the output:

Most Informative Features :( = True Negati : Neutra = 1864.7 : 1.0 :) = True Positi : Negati = 847.0 : 1.0 rt = True Neutra : Negati = 807.8 : 1.0 :d = True Positi : Neutra = 672.7 : 1.0 :-) = True Positi : Neutra = 215.0 : 1.0 … = True Neutra : Negati = 198.0 : 1.0 tory = True Neutra : Positi = 108.7 : 1.0 morning = True Positi : Neutra = 104.0 : 1.0 rather = True Neutra : Negati = 99.9 : 1.0 deal = True Neutra : Positi = 84.4 : 1.0

How do I compare all three together or If I add more sentiments how do I compare their ratios to each other

👁 messyaryal

messyaryal

August 13, 2020

The obtained accuracy is very high so I was wondering what made the model that accurate when it does not even handle double negation sentences. Does it consist of any outliers? Or Is there something else?

👁 konijetinikhil

konijetinikhil

November 27, 2020

What else classifier’s in nltk can we use here in place of Naive Bayes?

👁 eldesastre

eldesastre

January 28, 2021

Great tutorial, this is very much appreciated!

👁 elbowsoffthetable

elbowsoffthetable

July 28, 2021

One of, if not THE cleanest, well-thought-out tutorials I have seen! Thanks for taking the time and going to the trouble to get it right. Very helpful!..

👁 Creative Commons
This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike 4.0 International License.

Table of contents

Deploy on DigitalOcean
Click below to sign up for DigitalOcean's virtual machines, Databases, and AIML products.
Sign up

👁 Image

Become a contributor for community

Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.

👁 Image

DigitalOcean Documentation

Full documentation for every DigitalOcean product.

Learn more

👁 Image

Resources for startups and AI-native businesses

The Wave has everything you need to know about building a business, from raising funding to marketing your product.

Learn more

Get our newsletter

Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.

New accounts only. By submitting your email you agree to our Privacy Policy

The developer cloud

Scale up as you grow — whether you're running one virtual machine or ten thousand.

View all products

Start building today

From GPU-powered inference and Kubernetes to managed databases and storage, get everything you need to build, scale, and deploy intelligent applications.

Dark mode is coming soon.

URL: https://www.digitalocean.com/community/tutorials/how-to-perform-sentiment-analysis-in-python-3-using-the-natural-language-toolkit-nltk?comment=84040