Twitter Tweet Analysis Charts (Python)

Imagine you are a social media influencer, about to launch a big new Twitter campaign. You have your hashtag lined up and it is about to go live when it hits you – how do you know the tweets mentioning the campaign hashtag are positive? What if the hashtag is being used for negative tweets? For every great Twitter campaign there have been many that have not gone to plan, see – https://www.theguardian.com/technology/shortcuts/2012/nov/22/twitter-susan-boyle-susanalbumparty.

With this in mind I decided to use my Python and Azure knowledge to see if I could write a small Python program to track a hashtag and report back.

First up, I needed a trending hashtag. As I started on this project #boycottheck is trending. So I will use that hashtag for this exercise. NOTE: geektechstuff.com is not a political site, if you came here to promote or demote a political candidate, then you are at the wrong site.

geektechstuff_twitter_tweet_analysis_5 — #boycottheck hashtag on Twitter

Using the Twython Python module I can search Twitter for the hashtag and record the results into a prepared CSV file with appropriate headings. I am going to read my Twitter account details from a file called auth.py.

geektechstuff_twitter_tweet_analysis_3 — Prepared CSV file with appropriate headings

I have posted the Python for this project below, and as WordPress is not always the best for displaying Python coding I have also included links to the project on my GitHub.

geektechstuff_twitter_tweet_analysis_6 — Tweet Text Analysis V2

——

#!/usr/bin/python3

# geektechstuff

# Twitter Tweet Analysis V2

# modules to handle connecting to Twitter and saving to file

from twython import Twython

import requests

import csv

# modules to handle data analysis

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

# time to make a delay

import time

# Azure API endpoint

sentiment_uri = “https://uksouth.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment”

# Azure key

headers ={“Ocp-Apim-Subscription-Key”:”AZURE_KEY_HERE”}

# imports the Twitter API keys from a .py file called auth

from auth import (

consumer_key,

consumer_secret,

access_token,

access_token_secret

)

# sets up a variable called Twitter that calls the relevent Twython modules

twitter = Twython(

consumer_key,

consumer_secret,

access_token,

access_token_secret

)

def get_tweets(search_term):

# list to hold ids whilst checking them

ids_seen = []

# opens the txt file containing previously seen tweets

withopen(‘ids_seen.txt’, ‘r’) as filehandle:

filecontents = filehandle.readlines()

for line in filecontents:

# remove linebreak which is the last character of the string

current_place = line[:-1]

# add item to the list

ids_seen.append(current_place)

# this searches Twitter

results = twitter.cursor(twitter.search, q=search_term)

# this then pulls each individual tweet

for result in results:

tweet_id = result[‘id_str’]

if tweet_id in ids_seen:

print(tweet_id, “Skipping as already seen”)

else:

time.sleep(1)

# Tweet details that I may be interested in

tweet_text = result[‘text’]

tweeted_time = result[‘created_at’]

name = result[‘user’]

tweet_screen_name = name[‘screen_name’]

tweet_language = result[‘lang’]

ids_seen.append(tweet_id)

azure_response = “”

# Preparing the data to give to Azure

documents = {‘documents’ : [

{‘id’: tweet_id, ‘language’: ‘en’, ‘text’: tweet_text},

]}

# Sending the data to Azure

response = requests.post(sentiment_uri, headers=headers, json=documents)

# Getting response back from Azure

azure_response = response.json()

# Stripping the score out

try:

azure_documents = azure_response[‘documents’]

azure_score = azure_documents[0][‘score’]

# azure scores may need reviewing

if azure_score >=0.6:

azure_feedback = “positive”

elif azure_score ==0.5:

azure_feedback = “neutral”

else:

azure_feedback = “negative”

except:

azure_feedback = “ERROR”

# saves id so it does not get checked more than once

withopen(‘ids_seen.txt’, ‘w’) as filehandle:

filehandle.writelines(“%s\n” % place for place in ids_seen)

# saves the data of the tweet

withopen(‘twitter_data.csv’, ‘a’) as datawrite:

csv_write = csv.writer(datawrite, delimiter=’,’, quotechar=’”‘, quoting=csv.QUOTE_MINIMAL)

write_to_row = csv_write.writerow([tweet_screen_name, tweet_language, tweeted_time, tweet_text, azure_feedback])

return()

def twitter_analysis_language():

# reads csv created in get_tweets function

filehandle = pd.read_csv(“twitter_data.csv”)

filehandle.head()

tweet_lang = filehandle[‘tweet_language’].value_counts()

tweet_lang_index = filehandle[‘tweet_language’].value_counts().index

plt.bar(tweet_lang_index,tweet_lang)

plt.xlabel(“Language”)

plt.ylabel(“Number of Tweets”)

plt.title(“Tweet Language Breakdown”)

# saves bar chart as a PDF

plt.savefig(‘lang.pdf’)

return()

def twitter_analysis_sentiment():

# reads csv created in get_tweets function

filehandle = pd.read_csv(“twitter_data.csv”)

filehandle.head()

tweet_feedback = filehandle[‘azure_feedback’].value_counts()

tweet_feedback_index = filehandle[‘azure_feedback’].value_counts().index

plt.bar(tweet_feedback_index,tweet_feedback)

plt.xlabel(“Feedback”)

plt.ylabel(“Number of Tweets”)

plt.title(“Tweet Feedback Breakdown”)

# saves bar chart as a PDF

plt.savefig(‘feedback.pdf’)

return()

—

Running the function get_tweets(#boycottheck) collects a lot of tweets and then runs them through Azure’s text analytics service. I cancelled it after a few minutes as I am on the free tier of Azure and only get a few thousand calls every month.

I created two functions; the first to look at the languages used in the tweets. This one is called twitter_analysis_language().

geektechstuff_twitter_tweet_analysis_4 — Tweet Language Breakdown

I did this as I’m going to pass the text to Azure and want to make sure the majority of the tweets passed to Azure are in a language I’m expecting (in this case, en or English). Twitter reports back und when no language can be detected.

Next up is twitter_analysis_sentiment() which looks at the results back from Azure.

Why the ERROR? Well, I added an try/except as I think Azure was returning an error or NaN (Not a Number) for some of the replies so wanted to catch these out with an ERROR message. This could be because the tweet contained characters that Azure did not like or could have been another reason. I added a delay (time.sleep()) of a second incase it was the amount of data going through Azure but the ERROR continues. I may look at this in the future.

From the returned chart the positive / negative tweets look about equal; however this is because of some settings in my program that would need tweeking depending on use. Those settings are the Azure score.

This program uses the Azure text sentiment scoring system; 0 being negative, 0.5 being neutral and 1 being positive. I’ve narrowed this into anything that is 0.6 or above as positive, 0.5 is neutral, and anything else is negative. This has led to some false-positives: