April 06, 2018
Well, a sentiment analysis is formally defined as “a process of computationally identifying and categorizing emotions and opinions expressed in a piece of text, especially in order to determine whether the writer’s attitude towards a particular topic, product, etc. is positive, negative, or neutral.”
Some of the real-world applications of sentiment analysis are Social Media monitoring (how are people reacting to certain things on social media), Marketing (if people are liking your product or not by analyzing product reviews), Political analysis (which politician is more popular among the majority of people), among many others.
Twitter is a great source of collecting and analyzing thousands of diverse opinions and emotions expressed by real people all over the world on diverse range of topics every single second of every single day. Besides being a great repository for gathering data published by real people, a tweet is ideal for sentiment analysis for two other reasons —i) tweets are easy to collect and categorize ii) tweets are smaller in length(140 characters), so will exhaust the memory relatively less.
First, you need a computer, obviously. You need to have Python and its associated environment set up in your machine and a to write your script. If you are confused, check out my for a watered down explanation of getting everything necessary installed.
If you have never heard of an API before, well, it is basically a set of functions that helps you create a program that can access the features or data of other services, for example Twitter in our case. First, you need to create a Twitter Application here.
Click on Create New App and fill in the details
For this project, we will use two libraries — tweepy and textblob. Tweepy is an easy-to-use Python library for accessing the Twitter API and Textblob is a library to process textual data and we will use it to measure the sentiment of our tweets. The installation process is fairly simple — use pip (Check my other article if you don’t know how to install Python packages). Pip install the two libraries.
Next, we need to authenticate our program with the Twitter API. If you go back to your Twitter Application and head over to “Keys and Access Tokens” tab, you will see these four entities (which I have screened out because they are my precious 😇) that you will need to import in your script to create your authentication channel.
To include them in your script, create four variables and assign them with your magical access strings.
Next, we will use a method called OAuthHandler written inside the tweepy library and pass our consumer key and consumer secret as the parameters. For our project, we don’t need to know the details of these method (because bro, abstractions). Let’s just say, tweepy uses the method to perform its internal calculations. We have to then call the set_access_token method of our auth variable and pass our access tokens as arguments.
After that, we will create our magical variable that will help us use all the API services we will need for our project. We will call it api and assign the API method of tweepy which takes only one argument — auth.
Now that we have established our access to the Twitter API, we can start doing a bunch of things. For starters, we will search for tweets. How about Trump? lol So we will basically need to create a query and assign it Trump so that our program looks for all the tweets talking about Trump and fetches them for us. We can also assign the upper limit of the maximum number of tweets we want our program to stop at, how about we go with ten thousand!
Tweepy has introduced a new way of iterating through timelines, user lists, direct messages, etc, making the process of traversing through pagination easier — Cursor objects. You can read its docs to know more about what you can do with it. Through the Cursor of our tweepy, we will pass our api’s search method and the query to basically tell it what to look for, and additionally tell our program how many items (tweets) to find. We will assign all these to our variable public_tweets.
We are almost done. To print all the tweets, we need to write a simple for-loop. I implemented two additional filters within my for-loop to only get tweets which are in English and exclude all the retweets (you can tweak the filters according to your preference).
If you have done everything right, these should get you all the tweets pouring down in your console like a freakin’ sandstorm! Now what we need to do is we need to calculate the sentiment of each tweet as we fetch them. Time to remember our good old friend TextBlob. Text Blob has a method called sentiment which analyzes the text that is passed through it and calculates its Polarity (range[-1.0, 1.0], where -1 is very negative and 1 is very postive) and Subjectivity (range [0.0, 1.0], where 0.0 is very objective and 1.0 is very subjective). We will create a variable, analysis, and assign it our tweets passed through TextBlob. Now we can call the sentiment method and print them with our tweets. We will just add two additional lines at the end to our original for-loop to get their respective sentiments.
This is what you should get — Polarity and Subjectivity. We can use these sentiments to create beautiful visualizations like pie-chart and scatter-plot and others using amazing Python libraries like matplotlib, seaborn, etc., which could be the subject of a separate article on its own.
© Amitabha Dey. All rights reserved.