Twitter Tweet Analysis Charts (Python)

Tweet Text Analysis V2 in Python

Imagine you are a social media influencer, about to launch a big new Twitter campaign. You have your hashtag lined up and it is about to go live when it hits you – how do you know the tweets mentioning the campaign hashtag are positive? What if the hashtag is being used for negative tweets? For every great Twitter campaign there have been many that have not gone to plan, see – https://www.theguardian.com/technology/shortcuts/2012/nov/22/twitter-susan-boyle-susanalbumparty.

With this in mind I decided to use my Python and Azure knowledge to see if I could write a small Python program to track a hashtag and report back.

First up, I needed a trending hashtag. As I started on this project #boycottheck is trending. So I will use that hashtag for this exercise. NOTE: geektechstuff.com is not a political site, if you came here to promote or demote a political candidate, then you are at the wrong site.

geektechstuff_twitter_tweet_analysis_5
#boycottheck hashtag on Twitter

Using the Twython Python module I can search Twitter for the hashtag and record the results into a prepared CSV file with appropriate headings. I am going to read my Twitter account details from a file called auth.py.

geektechstuff_twitter_tweet_analysis_3
Prepared CSV file with appropriate headings

I have posted the Python for this project below, and as WordPress is not always the best for displaying Python coding I have also included links to the project on my GitHub.

geektechstuff_twitter_tweet_analysis_6
Tweet Text Analysis V2

——

#!/usr/bin/python3
# geektechstuff
# Twitter Tweet Analysis V2
# modules to handle connecting to Twitter and saving to file
from twython import Twython
import requests
import csv
# modules to handle data analysis
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# time to make a delay
import time
# Azure API endpoint
# Azure key
headers ={“Ocp-Apim-Subscription-Key”:”AZURE_KEY_HERE”}
# imports the Twitter API keys from a .py file called auth
from auth import (
consumer_key,
consumer_secret,
access_token,
access_token_secret
)
# sets up a variable called Twitter that calls the relevent Twython modules
twitter = Twython(
consumer_key,
consumer_secret,
access_token,
access_token_secret
)
def get_tweets(search_term):
# list to hold ids whilst checking them
ids_seen = []
# opens the txt file containing previously seen tweets
withopen(‘ids_seen.txt’, ‘r’) as filehandle:
filecontents = filehandle.readlines()
for line in filecontents:
# remove linebreak which is the last character of the string
current_place = line[:-1]
# add item to the list
ids_seen.append(current_place)
# this searches Twitter
results = twitter.cursor(twitter.search, q=search_term)
# this then pulls each individual tweet
for result in results:
tweet_id = result[‘id_str’]
if tweet_id in ids_seen:
print(tweet_id, “Skipping as already seen”)
else:
time.sleep(1)
# Tweet details that I may be interested in
tweet_text = result[‘text’]
tweeted_time = result[‘created_at’]
name = result[‘user’]
tweet_screen_name = name[‘screen_name’]
tweet_language = result[‘lang’]
ids_seen.append(tweet_id)
azure_response = “”
# Preparing the data to give to Azure
documents = {‘documents’ : [
{‘id’: tweet_id, ‘language’: ‘en’, ‘text’: tweet_text},
]}
# Sending the data to Azure
response = requests.post(sentiment_uri, headers=headers, json=documents)
# Getting response back from Azure
azure_response = response.json()
# Stripping the score out
try:
azure_documents = azure_response[‘documents’]
azure_score = azure_documents[0][‘score’]
# azure scores may need reviewing
if azure_score >=0.6:
azure_feedback = “positive”
elif azure_score ==0.5:
azure_feedback = “neutral”
else:
azure_feedback = “negative”
except:
azure_feedback = “ERROR”
# saves id so it does not get checked more than once
withopen(‘ids_seen.txt’, ‘w’) as filehandle:
filehandle.writelines(“%s\n” % place for place in ids_seen)
# saves the data of the tweet
withopen(‘twitter_data.csv’, ‘a’) as datawrite:
csv_write = csv.writer(datawrite, delimiter=’,’, quotechar='”‘, quoting=csv.QUOTE_MINIMAL)
write_to_row = csv_write.writerow([tweet_screen_name, tweet_language, tweeted_time, tweet_text, azure_feedback])
return()
def twitter_analysis_language():
# reads csv created in get_tweets function
filehandle = pd.read_csv(“twitter_data.csv”)
filehandle.head()
tweet_lang = filehandle[‘tweet_language’].value_counts()
tweet_lang_index = filehandle[‘tweet_language’].value_counts().index
plt.bar(tweet_lang_index,tweet_lang)
plt.xlabel(“Language”)
plt.ylabel(“Number of Tweets”)
plt.title(“Tweet Language Breakdown”)
# saves bar chart as a PDF
plt.savefig(‘lang.pdf’)
return()
def twitter_analysis_sentiment():
# reads csv created in get_tweets function
filehandle = pd.read_csv(“twitter_data.csv”)
filehandle.head()
tweet_feedback = filehandle[‘azure_feedback’].value_counts()
tweet_feedback_index = filehandle[‘azure_feedback’].value_counts().index
plt.bar(tweet_feedback_index,tweet_feedback)
plt.xlabel(“Feedback”)
plt.ylabel(“Number of Tweets”)
plt.title(“Tweet Feedback Breakdown”)
# saves bar chart as a PDF
plt.savefig(‘feedback.pdf’)
return()
Running the function get_tweets(#boycottheck) collects a lot of tweets and then runs them through Azure’s text analytics service. I cancelled it after a few minutes as I am on the free tier of Azure and only get a few thousand calls every month.
I created two functions; the first to look at the languages used in the tweets. This one is called twitter_analysis_language().

geektechstuff_twitter_tweet_analysis_4
Tweet Language Breakdown

I did this as I’m going to pass the text to Azure and want to make sure the majority of the tweets passed to Azure are in a language I’m expecting (in this case, en or English). Twitter reports back und when no language can be detected.

Next up is twitter_analysis_sentiment() which looks at the results back from Azure.

Twitter text analysis chart
Twitter text analysis chart
Why the ERROR? Well, I added an try/except as I think Azure was returning an error or NaN (Not a Number) for some of the replies so wanted to catch these out with an ERROR message. This could be because the tweet contained characters that Azure did not like or could have been another reason. I added a delay (time.sleep()) of a second incase it was the amount of data going through Azure but the ERROR continues. I may look at this in the future.
From the returned chart the positive / negative tweets look about equal; however this is because of some settings in my program that would need tweeking depending on use. Those settings are the Azure score.
Azure Score
Azure Score

This program uses the Azure text sentiment scoring system; 0 being negative, 0.5 being neutral and 1 being positive. I’ve narrowed this into anything that is 0.6 or above as positive, 0.5 is neutral, and anything else is negative. This has led to some false-positives:

False Positives
False Positives

This may be fixable by adjusting those numbers or using V3 of the Azure text sentiment analyserwhich can return sentiment labeling:

Sentiment Labeling
Sentiment Labeling

Further Reading:

The files for this project can be found on my GitHub at: https://github.com/geektechdude/Tweet_Text_Analysis , particularly https://github.com/geektechdude/Tweet_Text_Analysis/blob/master/geektechstuff_tweet_text_analysis_v2.py

I discussed setting up Text Analytics on Microsoft Azure here: https://geektechstuff.com/2019/02/26/analysing-tweet-sentiment-with-azure-python/

and I previously looked at creating charts from data via Python here:

https://geektechstuff.com/2019/03/12/doing-more-with-csv-data-python/