In this article we are going to learn how to do some basic sentiment analysis with Python, using a wordlist-based approach and the afinn
package.
First, you will need to install the package:
pip install afinn
or
pip3 install afinn
on Mac/Linux
You will also need to install the following packages in the same way if you haven’t already: google
, requests
, beautifulsoup
,pandas
, matplotlib
, seaborn
.
The basic idea with the afinn
package is that we have a wordlist which has a score in terms of positivity or negativity assigned to each word, ranging from -5
(very negative) to +5
(very positive).
For example:
amazes 2
amazing 4
ambitious 2
ambivalent -1
amicable 2
amuse 3
amused 3
amusement 3
amusements 3
anger -3
These scores are used as the basis for the evaluation of a text string.
The process we will follow in this lesson is:
- Use Python to programmatically perform a Google search for a given phrase
- Extract the titles from the URLs provided by the previous step
- Perform sentiment analysis on the titles
- Collate the results into a Pandas dataframe
- Display the results in a graph
It is probably best to use a Jupyter Notebook for the code in this lesson, in order to avoid having to run time-consuming operations each time you make a change, as results are stored in variables which can be reused throughout the notebook. For more information on Jupyter Notebooks, see here.
Using Python to programmatically perform a Google search for a given phrase
The first step is to get the URLs from a Google search and store them in a list. Obviously if you already have a webpage in mind or some text you wish to analyse, you can skip this step.
from googlesearch import search
query = "bunny rabbit" # Try your own search terms
num_results = 30
result_urls = []
for result in search(
query, # The query you want to run
tld="com", # The top level domain
lang="en", # The language
num=10, # Number of results per page
start=0, # First result to retrieve
stop=num_results, # Last result to retrieve
pause=2.0, # Lapse between HTTP requests
):
result_urls.append(result)
result_urls
['https://www.youtube.com/watch?v=hDJkFLnmFHU',
'https://www.youtube.com/watch?v=dpvUQagTRHM',
'https://www.rspca.org.uk/adviceandwelfare/pets/rabbits',
'https://en.wikipedia.org/wiki/Rabbit',
'https://en.wikipedia.org/wiki/Rabbit#Terminology',
'https://en.wikipedia.org/wiki/Rabbit#Taxonomy',
'https://en.wikipedia.org/wiki/Rabbit#Biology',
'https://en.wikipedia.org/wiki/Rabbit#Ecology',
'https://www.petakids.com/save-animals/never-buy-bunnies/',
...
Scraping Headlines with Python for Sentiment Analysis
Next we will use requests
and beautifulsoup
to scrape the urls retrieved in the last step, and store the results in a new list. For now we will just focus on the first h1
tag on each page visited, as this is a good place to start if we are looking for headlines.
import requests
from bs4 import BeautifulSoup
title_list = []
for url in result_urls:
try:
r = requests.get(url, timeout=3)
soup = BeautifulSoup(r.content, "html.parser")
html_element = soup.find("h1")
article_title = html_element.text.strip()
title_list.append(article_title)
except Exception as e:
pass # ignore any pages where there is a problem
title_list
'9 Reasons Why You Shouldn’t Buy a Bunny',
'My House Rabbit',
'What’s The Difference Between A Bunny, A Rabbit, And A Hare?',
'Rabbit',
'406 Not Acceptable',
'Rabbit Behavior',
'14 Fascinating Facts About Rabbits',
'Bunny Rabbit',
...
Performing Sentiment Analysis using Afinn
Now we move on to using the afinn
package to perform the actual sentiment analysis. Once we have the results, stored in lists, we create a pandas
dataframe for each display and analysis of the results.
from afinn import Afinn
import pandas as pd
af = Afinn()
# Compute sentiment scores and categories
sentiment_scores = [af.score(article) for article in title_list]
sentiment_category = ['positive' if score > 0 else 'negative' if score < 0 else 'neutral' for score in sentiment_scores]
# Create Pandas dataframe from results and display
df = pd.DataFrame([title_list, sentiment_scores, sentiment_category]).T # .T: swap rows and cols
df.columns = ['headline', 'sentiment_score', 'sentiment_category']
df['sentiment_score'] = df.sentiment_score.astype('float')
df.describe()
This gives us some descriptive statistics for the dataframe. Notice that there is an overall mean score of 0.233..
, meaning a slight positive sentiment, if our results were statistically significant (which they probably aren’t – see further down for why).
sentiment_score | |
---|---|
count | 30.000000 |
mean | 0.233333 |
std | 1.194335 |
min | -2.000000 |
25% | 0.000000 |
50% | 0.000000 |
75% | 0.000000 |
max | 4.000000 |
Here’s the dataframe itself:
headline | sentiment_score | sentiment_category | |
---|---|---|---|
0 | Before you continue to YouTube | 0.0 | neutral |
1 | Before you continue to YouTube | 0.0 | neutral |
2 | Navigation | 0.0 | neutral |
3 | Rabbit | 0.0 | neutral |
4 | Rabbit | 0.0 | neutral |
5 | Rabbit | 0.0 | neutral |
6 | Rabbit | 0.0 | neutral |
7 | Rabbit | 0.0 | neutral |
8 | 9 Reasons Why You Shouldn’t Buy a Bunny | 0.0 | neutral |
9 | My House Rabbit | 0.0 | neutral |
10 | What’s The Difference Between A Bunny, A Rabbi… | 0.0 | neutral |
11 | Rabbit | 0.0 | neutral |
12 | 406 Not Acceptable | 1.0 | positive |
13 | Rabbit Behavior | 0.0 | neutral |
14 | 14 Fascinating Facts About Rabbits | 3.0 | positive |
15 | Bunny Rabbit | 0.0 | neutral |
16 | Error\n1020 | -2.0 | negative |
17 | 0.0 | neutral | |
18 | 13 Rabbit Facts Prove the Point: Bunnies Aren’… | 0.0 | neutral |
19 | Pet Rabbits and Your Health | 0.0 | neutral |
20 | Rabbit & Bunny Soft Toys | 0.0 | neutral |
21 | A Complete Guide to the Best Rabbit Breeds | 3.0 | positive |
22 | John Lewis & Partners Bunny Rabbit Plush Soft Toy | 0.0 | neutral |
23 | Bunny vs Rabbit – Find out what’s the difference! | 0.0 | neutral |
24 | Bunny snatched: Record-holding giant rabbit st… | -2.0 | negative |
25 | 10 hopping fun rabbit facts! | 4.0 | positive |
26 | Bunny Rabbit Knitting Kit and Pattern | 0.0 | neutral |
27 | Bunny, Rabbit & Hare, Oh My! What’s The Differ… | 0.0 | neutral |
28 | KitKat Bunny opens the doors to its brand new … | 0.0 | neutral |
29 | Petfinder is currently undergoing updates to h… | 0.0 | neutral |
As you can see, a lot of what we collected is “noise”. However there is some useful data to work with. One improvement might be to remove the search term itself from the “headlines” using the pandas replace
method.
Plotting Sentiment Analysis Results Using Seaborn
Now let’s plot the results. Plotting with seaborn is a breeze. There are many types of plot available but here we will use countplot
as it meets our needs nicely.
import seaborn as sns
import matplotlib.pyplot as plt
plt.style.use("seaborn")
fig, axes = plt.subplots()
bp = sns.countplot(x="sentiment_score", data=df, palette="Set2").set_title(
f"Sentiment Analysis with Python. Search Term: {query}"
)
plt.show()
Interpreting the results
Although the results from this activity are potentially quite interesting, we should not be be too serious about any conclusions we draw from them. Generally speaking the contents of an h1
tag are insufficient to make meaningful inferences about a post’s sentiment. The main point of the article was to get you started with sentiment analysis with Python, and provide you with a few tools you can use in your own investigations. For more insightful results, perhaps focus on a single webpage or document. It’s up to you whether you collect your data manually or use something like what we did above with beautifulsoup
to scrape it from a webpage. Use the ideas here as a springboard, and have fun.
In this article we have learned how to perform basic sentiment analysis with Python. We used Python to perform a Google search and then scraped the results for headlines. We then analysed the headlines for sentiments score and created a dataframe from the results and displayed them in a graph. I hope you found the article interesting and helpful.
Happy computing!