Sentiment analysis on Question Period speeches using TextBlob, the Perspective API and Python

This past winter I was teaching a class on Social Networks at George Brown College and we examined how many companies are using social media sentiment analysis to get further insight on the public’s perception of their brand and products.

We went through a few great examples of how computer generated sentiment analysis was being used in a variety of fields including journalism. Vox’s article from 2016, where they did sentiment analysis on seven months of Donald Trump’s tweets, inspired me to move beyond abstract discussions about sentiment analysis and figure it out for myself.

One of the things that I do in my work at the CBC is running live streams of Canada’s Question Period for the CBC News social media team. I have probably watched hundreds of QP sessions over the last few years and while watching it one day I thought back to the The Toronto Star’s excellent interactive feature Parliament in Check and started thinking about ways that I could use text analysis tools on these Question Period speeches.

A few times at the CBC, I have been asked to go through a popular Facebook post and find comments that were particularly insightful. This has always been a challenging exercise because often you needed to sift through thousands of comments, including many problematic ones, to find meaningful ones. That got me looking for a tool that could do the sifting for us and that is when I stumbled upon Google’s Perspective API.

Using Python and the Open Parliament API, I put all of the House of Commons speeches (over 128,000 speeches and counting) since the 2015 election in a MySQL database. Using the Perspective API and the TextBlob library, I analyzed all of the Question Period speeches (a smaller subset of almost 40,000 speeches) for toxicity and sentiment.

My approach for this project is a bit problematic for a couple of reasons. The Perspective API is not designed to analyze human speech, it is built to analyze online comments. Its AI model, which helps determine whether a piece of text is toxic, is trained using online comments. Text that might be considered toxic in an online space may not be considered toxic with the contextualization of human speech. Automated sentiment analysis faces some of the same challenges.

In another article I am going to run explore a bit what the analysis revealed, but despite the limitations of my approach I think TextBlob and the Perspective API did a pretty good job at highlighting speeches that bordered on problematic and that were very negative or positive. It also provided some very interesting insight on which parties or MPs are most negative or positive.

Check out the results here : http://stuartduncan.ca/parliament/