Analyzing Tweets to Final Thoughts – Text Analytics
May 7, 2016
On April 24, 2016, the curtain fell on the fiery series of Pilipinas Debates. Although, it is still anybody’s ball game until the ninth day of May this year.
Evenso, studying the outcome of the debate is important for us to evaluate how it has influenced the voting population. This initiative by the Commission on Elections (COMELEC) to organize a debate series for our aspiring national leaders was very timely, given how social media has become a norm in the everyday lives of most people. With this comes quicker access to information and more freedom of expression, two elements which led to proactive and responsive interaction among users, and resulted to trending topics such as #PiliPinasDebates2016.
Knowing that much information can be derived from this huge mass of data, the Pointwest Data Analytics team published a statistical analysis on the second leg of the presidential debate. In this study, the team sought to understand the tweets behind the last leg to wrap up this series.
The third and final stretch of Pilipinas Debates 2016 took place in Phinma University of Pangasinan on April 24, 2016. Following a townhall meeting format, it was aired live by ABS-CBN from 5:45 PM to 9 PM, post-debate analysis excluded. Unlike the previous leg, all five presidentiables were present in the event. The official hashtag trended on Twitter with over a million tweets, which showed consistent interest on the part of the voters.
The team worked on the same hashtags, #PiliPinasDebates2016 and #PiliPinasDebates, and used Twitter’s streaming API to extract data beginning 12:00 AM up to 11:59 PM of April 24. A total of 828,795 tweets were gathered and 457,941 of those contained mentions of the presidential candidates.
For this analysis, a more comprehensive set of keywords was used in determining the tweets with mentions.
- Binay – Binay, Jejomar, VPJojoBinay, VPJejomarBinay, JojoBinay, JejomarBinay, Jojo, Binay2016, JojoBinay2016, JejomarBinay2016
- Duterte – Du30, Duterte, RDD_Davao, Digong, Rody, Rodrigo, RodyDuterte, RodrigoDuterte, MayorDuterte, MayorRodyDuterte, Duterte2016, RodyDuterte2016, RodrigoDuterte2016
- Poe – Grace, Poe, SenGracePoe, GracePoe, SenGrace, SenPoe, Poe2016, GracePoe2016, Grace2016
- Roxas – Mar, Roxas, MARoxas, MarRoxas, SecMar, SecMarRoxas, SecRoxas, Mar2016, Roxas2016
- Santiago – MDS, Miriam, Defensor-Santiago, SenMiriam, Santiago, Defensor, Meriam, MiriamDefensor-Santiago, MiriamSantiago, MDS2016, Miriam2016, Santiago2016, MiriamSantiago2016
Social Media Analytics on Share of Voice
The presidentiables’ impact on Twitter users was again assessed using the Share of Voice metric, which follows the same definition as the one used in our prior analysis (Read Case Study: From Tweets to Thoughts). With a new set of keywords, a different debate format, and with all the candidates in attendance, the results for this metric was expected to be different.
In agreement with the April 18-20 survey results of Social Weather Stations (SWS), Davao City Mayor Rodrigo Duterte took the biggest portion, having the most number of mentions in our data from Twitter. Close behind was Senator Miriam Defensor-Santiago, who did not fail to make her presence felt during the final stretch of the debate series. Former DILG Secretary Mar Roxas grabbed third place, another consistent result with the recent SWS survey. Vice President Jejomar Binay and Senator Grace Poe belonged to the bottom two, garnering scores below the twenty percent (20%) mark.
Text Analytics on Polarity
The team added another layer to the analysis, which is polarity. Polarity is a term used in text analytics to refer to the property of a text to be positive or negative about its subject. To tag the hundreds of thousands tweets that were extracted with polarity information, the team used the same algorithm and corpus as in our previous analysis, but enriched with a batch of newly-tagged double meaning Filipino words and emojis.
The results per candidate was inspected – from the one(s) with the highest number of tweets to the one(s) with the lowest. In the following paragraph, two ratios will be explored — neutral to with polarity tweets, and negative to positive tweets.
The first ratio, neutral to with polarity tweets, was about the same for almost all candidates. For each candidate except Senator Poe, approximately forty percent of the tweets were classified as neutral, while sixty were tagged with polarity. The four candidates, however, differed on the second ratio. For every one negative tweet about Senator Santiago, there were around three positive ones. For Mayor Duterte, it was about 1 positive is to 2 negative tweets. It was about two negative versus three positive tweets for Secretary Roxas. Then, every positive tweet was countered by about one negative for Vice President Binay. Only twenty five percent of the tweets about Senator Grace Poe were neutral. The negative to positive tweets ratio for Poe was much alike with Senator Miriam, which was one is to three.
Similar to a previously published article, and the same will be emphasized here: SHARE OF VOICE does not translate to SHARE OF VOTES. This aspect just answers the question, “How often was the candidate mentioned?” Polarity could give us a better clue, but our sample deviates from a representative one.
The team clarified that the results of this study will not in any way predict who the winners of the elections would be. First, not all tweeters are registered voters, others are even below 18:
— Carla Quizon (@carlakeyzone) April 24, 2016
Also, not all registered voters have a Twitter account or are active in the Twitterverse during the debate. People also have different “tweeting rates.” But how small the intersection of the two populations – the registered voters and tweeters – may be it is still interesting to study the polarity of tweets.
Social media may not be perfect in capturing real life situations, but we could not say that it is an entirely different world from the one we are physically living in. People affect social media; your friends help determine what your news feed will look like. Social media also affects us, changing our lifestyle and impacting our decisions.
News may become viral on social media before the mainstream media reports it, and some people nowadays rely on their news feed to stay updated. People upload and retweet not only statuses, photos, and videos, but their opinions, and arguments with them and these, in turn, are “downloaded” by their followers (or even strangers), and help them formulate decisions, from inconsequential ones such as where to have dinner to those that are life-changing such as who to vote on the 9th of May.
Thus, we study what is happening in social media without thinking of it as a replica of what is happening around us, but as a factor that could probably have a bigger and bigger influence on it.