Individual Submission Summary
Share...

Direct link:

Modeling and Predicting Spanish Political Opinions in Twitter through Automatic Sentiment Analysis based on Machine Learning Approaches

Fri, May 26, 9:30 to 10:45, Hilton San Diego Bayfront, Floor: 5, Cobalt 500

Abstract

The confrontation and debate generated in Spain by the emergency of two new political parties have had place in social media, especially in Twitter where political leaders have an intensive role. In this context, public opinion research in real time is a key factor to plan political and social interventions. Using a machine-learning approach (Kelleher, Mac Namee, & D'Arcy, 2015) we modeled the positive/neutral and negative messages in Twitter referred to the four main political parties in Spain in order to predict the sentiment in real time discussions about each party. We labeled a corpus of 8000 tweets written in Spanish that contained a hashtag mentioning a political party (2000 for each: #Psoe, #PP, #Podemos and #Ciudadanos) during non-electoral dates, divided into positive/neutral tone and negative tone. This corpus trained 6 supervised machine learning models. Using a testing corpus we assessed the accuracy for sentiment prediction of each model, obtaining values around 0.70. If we take into account that accuracy on human agreement on sentiment classification is between 70% and 80% (Gwet, 2014; Krippendorff, 2012), we may consider the machine accuracy for large-scale data more than acceptable. Except for the efforts of García et al. (2016) and Hurtado, Pla & Buscaldi (2015), there is little research focused on building supervised machine learning models to classify political texts in Spanish language. Thus, using these predictive models we show how to connect to the streaming of Twitter (using the available API Streaming) and filter for tweets written in Spanish (lang=“es”) containing a hashtag of any of the 4 main Spanish political parties in order to predict the sentiment of each tweet generated in real time during a non-electoral day and automatically visualize those with high confidence intervals (>0.80). These models might acquire higher relevance if they are used during an electoral journey in order to evaluate if they can predict voting results in a similar way exit polls do. Thus, modeling Spanish political opinions in Twitter provides a unique opportunity to test prediction for future election results in a country where surveys accuracy is constantly evaluated by public opinion and experts. Moreover, predicting real time political sentiment in Twitter during a long period allows longitudinal analysis to detect changes over the time for each political party and compare these changes with day-to-day events, which in turn allow better interventions based on communication research.

Authors