VOOZH about

URL: https://www.digitalocean.com/community/tutorials/how-to-perform-sentiment-analysis-in-python-3-using-the-natural-language-toolkit-nltk?comment=85639

⇱ How To Perform Sentiment Analysis in Python 3 Using the Natural Language Toolkit (NLTK) | DigitalOcean


How To Perform Sentiment Analysis in Python 3 Using the Natural Language Toolkit (NLTK)

Published on September 27, 2019
πŸ‘ How To Perform Sentiment Analysis in Python 3 Using the Natural Language Toolkit (NLTK)

The author selected the Open Internet/Free Speech fund to receive a donation as part of the Write for DOnations program.

Introduction

A large amount of data that is generated today is unstructured, which requires processing to generate insights. Some examples of unstructured data are news articles, posts on social media, and search history. The process of analyzing natural language and making sense out of it falls under the field of Natural Language Processing (NLP). Sentiment analysis is a common NLP task, which involves classifying texts or parts of texts into a pre-defined sentiment. You will use the Natural Language Toolkit (NLTK), a commonly used NLP library in Python, to analyze textual data.

In this tutorial, you will prepare a dataset of sample tweets from the NLTK package for NLP with different data cleaning methods. Once the dataset is ready for processing, you will train a model on pre-classified tweets and use the model to classify the sample tweets into negative and positives sentiments.

This article assumes that you are familiar with the basics of Python (see our How To Code in Python 3 series), primarily the use of data structures, classes, and methods. The tutorial assumes that you have no background in NLP and nltk, although some knowledge on it is an added advantage.

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

About the author(s)

Shaumik is an optimist, but one who carries an umbrella. An undergrad at IITR, he loves writing, when he's not busy keeping the blue flag flying high.

Still looking for an answer?

Was this helpful?

This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Hi Shaumik,

Thank you very much for this brilliant tutorial. I’m in the process of developing a few custom tools for Alteryx and this tutorial was absolutely legendary!!

Really interesting read, I wonder about the speed though. Would having this hosted as a service as an API endpoint on lambda or cloud functions make the speed of feedback somewhat usable in real-world scenarios or you have any other tips on the matter?

So how can we alter the logic, so you would only need to do all then training part only once - as it takes a lot of time and resources. And in real life scenarios most of the time only the custom sentence will be changing.

I think there’s a slice too much in this example:

tweet_tokens = twitter_samples.tokenized('positive_tweets.json')[0]

print(tweet_tokens[0])

Seems to me you wanted to show a single example tweet, so makes sense to keep the [0] in your print() function, but remove it from the line above. Otherwise tweet_tokens becomes less useful.

Hi, Shaumik:

In the final code, what is

text = twitter_samples.strings('tweets.20150430-223406.json')

…for? It looks like β€˜text’ is never referenced or used after that.

Thank you, ~Todd

I tried the sentiment analysis with the positive and negative tweets but I want to add more sentiments to it like sarcasm or neutral. I tried to add 5000 neutral tweets and followed the same procedure like positive and negative. If I do so can I get the ratio of all the three sentiments when I use the β€˜classifier.show_most_informative_features(10)’ command . Currently I am getting ratios of neutral with either only positive or negative

following is the output:

Most Informative Features :( = True Negati : Neutra = 1864.7 : 1.0 :) = True Positi : Negati = 847.0 : 1.0 rt = True Neutra : Negati = 807.8 : 1.0 :d = True Positi : Neutra = 672.7 : 1.0 :-) = True Positi : Neutra = 215.0 : 1.0 … = True Neutra : Negati = 198.0 : 1.0 tory = True Neutra : Positi = 108.7 : 1.0 morning = True Positi : Neutra = 104.0 : 1.0 rather = True Neutra : Negati = 99.9 : 1.0 deal = True Neutra : Positi = 84.4 : 1.0

How do I compare all three together or If I add more sentiments how do I compare their ratios to each other

The obtained accuracy is very high so I was wondering what made the model that accurate when it does not even handle double negation sentences. Does it consist of any outliers? Or Is there something else?

What else classifier’s in nltk can we use here in place of Naive Bayes?

Great tutorial, this is very much appreciated!

One of, if not THE cleanest, well-thought-out tutorials I have seen! Thanks for taking the time and going to the trouble to get it right. Very helpful!..

πŸ‘ Creative Commons
This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike 4.0 International License.
  • Deploy on DigitalOcean

    Click below to sign up for DigitalOcean's virtual machines, Databases, and AIML products.

Become a contributor for community

Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.

DigitalOcean Documentation

Full documentation for every DigitalOcean product.

Resources for startups and AI-native businesses

The Wave has everything you need to know about building a business, from raising funding to marketing your product.

Get our newsletter

Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.

New accounts only. By submitting your email you agree to our Privacy Policy

The developer cloud

Scale up as you grow β€” whether you're running one virtual machine or ten thousand.

Start building today

From GPU-powered inference and Kubernetes to managed databases and storage, get everything you need to build, scale, and deploy intelligent applications.

Β© 2026 DigitalOcean, LLC.Sitemap.
Dark mode is coming soon.