Unsolicited Sentiment Analytics for Airlines in the US

Mike Togle
November 16, 2016


Passengers who regularly travel recognize the struggles of airline companies in providing consistent and positive customer experience. To understand what passengers say about their services, airline companies have been trying to engage with passengers through social media and measure it through sentiment analytics.

Social media sites are useful sources of unstructured data that customer experience and social marketing professionals can use to determine certain issues on facilities and services airlines provide, and what can be added to improve customer experience.

Text analytics is one way to give decision makers information on what people say about their brands on social media. As an example, analyses were done on tweets directed to five major airline companies in the US: American Airlines (@AmericanAir), Delta (@Delta and @DeltaAssist), Southwest Airlines (@SouthwestAir), United Airlines (@United), and Virgin America (@VirginAmerica). Sentiment analytics was performed to classify these tweets into three categories – positive, negative, and neutral.

Furthermore, visualizations and descriptive statistics were included to give us more insights about passenger sentiments. The count of negatively tagged tweets increases with the flow of customer complaints. We drilled down to the cause of this through inspecting what aspects or services of the airline companies the complaints are about.


Data Extraction

All tweets were extracted from April 17, 2016 to April 27, 2016 through a  Twitter API called ‘Search’. We can indicate keyword(s) to be “searched” by the API. It returns tweets containing the indicated keyword(s). Although the API doesn’t return 100% of the tweets with the keyword(s), it returns recent and popular tweets. As mentioned above, we used the official Twitter handles of the five airlines as keywords in this example.

Word Cloud

Word clouds are visual representations of word frequency. The size of the font correlates with how frequent the word is being used. The larger the font of the word, the more frequent it appeared on tweets. Word clouds can help communicate the most salient themes through effectively highlighting the most frequently used terms. In this example, we used R’s ‘wordcloud’ library to make a wordcloud.

Sentiment Analytics

Sentiment analytics is a text analytics technique that is used to determine whether a piece of writing (can be a phrase, sentence or a document) is positive, neutral or negative in nature. It can also derive the opinion of the author. The analysis can be done by establishing a set of rules that will enable the sentiment analytics model to examine the nature of the text.

Polarity of tweets

Through Natural Language Processing (NLP) techniques, words tagged with polarity, and a set of language rules translated into Python codes, we were able to classify the tweets into the following categories: positive, neutral and negative tweets.

Classification of negatively scored tweets

To further understand what customers complain about on Twitter, we classified negative tweets according to reasons (bad flight, cancelled flight, customer service, damaged luggage, flight attendant complaints, flight booking problems, late flight, lost luggage, and long lines). This analysis utilized a mixture of machine learning algorithms and NLP techniques. It involved tagging thousands of negative tweets with reasons similar to those mentioned above and training a classification model using the tagged tweets. To improve the performance of the classification model, the context of the tweet was considered by following a set of language rules.

Descriptive Statistics

Number of tweets per airline

Number of Tweets Per Airline (Bar Graph)

The bar chart above shows that American Airlines garnered the most number of mentions in tweets from their passengers for the given time period. What can be derived further is how Virgin America garnered the fewest mentions. Further analysis may show how the mentions relate to positive or negative sentiments about the airlines.

Word cloud

Word Cloud Based on Tweets

Word clouds somehow give us an idea of what people are talking about. In this case, word cloud for positively scored tweets is separated from that of negatively scored ones. The word cloud on the left shows words that are most frequently used on tweets having a positive sentiment score. We can see that “thanks”, “great”, “love”, “customer”, “service”, “security”, and “time” are among the included words. What does it say about the airline companies’ services? We can infer that there are passengers who were satisfied with the airlines’ customer service. On the other hand, the word cloud on the right shows words that are used on tweets that are negatively scored. Words such as “rude”, “missed”, “lost”, “stuck”, “late”, “delay”, and “cancelled” are included. From this, we can infer that the complaints of the passengers who tweeted are mostly about their flights being cancelled or delayed, their lost baggage, and the airline’s customer service. Deeper analysis will be made on the latter part of this study.

Number of tweets by sentiments (overall)

Number of Tweets by Polarity (Bar Graph)

In general, the positive tweets outnumbered the negative ones, although by only a small fraction. Airline companies would naturally want to lessen negative responses from their passengers. With this amount of bad comments relative to the amount of positive comments, they might want to check possible reasons why their customers give them negative feedback. The next figure shows the number of positive, neutral, and negative tweets across the five aforementioned airline companies.

Number of Tweets Per Polarity Per Airline (Bar Graph)

Even though American Airlines accumulated the most mentions, it also got the smallest gap between its number of positive and negative tweets. The polarity chart above similarly shows that Southwest and United are at par not only in terms of tweet count, but also in terms of positive to negative tweets ratio. While being the least mentioned airline, the number of positive tweets for Virgin America is twice the count of its negative.

In this analysis, however, Delta can be considered the leading airline having both a strong Twitter share of voice, and relatively more positive tweets than negative ones. Overall, there is not much difference between the number of negative and positive tweets for all five airline companies, which indicates that their customers are not satisfied with their services. Further analysis will be done to dissect what could be the possible reasons why these companies receive too much negative responses.

Tweets with negative sentiments

Reasons for Negative Tweets (Bar Graph)

The figure below shows a graph of reasons for negatively scored tweets directed to Delta. We can see that a large chunk of tweets are about delayed flights and bad customer service. (Similar analysis can be done for the other airlines, but to just show an example, only those tweets directed to Delta are analyzed.)


Monitoring what customers say about a particular brand is quite tedious when done manually considering the amount of data we can get from social media sites. We learned from this study that airline companies receive thousands of feedback from customers just on Twitter. With the help of machine learning algorithms and natural language processing techniques on the field of text analytics, there is a lot less effort in analyzing customer sentiments now as compared to how it is done traditionally (having a team monitoring social media sites manually, poring over tweets on Twitter and Facebook comments).

In this study, we have learned that while it is true that the amount of feedback airline companies receive reflects how popular their brand is, it is not always true that strong share of voice on social media sites means positive image in general, as in the case of American Airlines. We have seen that overall, there is a lot of negative feedback being voiced out on social networking sites (SNS). Therefore, it is advisable for airline companies to not just monitor a single value metric such as share of voice but learn the process that generates this number.

More important than knowing the proportion of negative tweets is being cognizant of the reasons why airlines receive such feedback. This is what we tried to do by sorting out the tweets according to the aspect of airline service they pertain to. By doing so, we have broken down this bulk of complaints into more manageable chunks.

Users of the study results could utilize this information in different ways. Airline companies may want to deal with the most common issues – delayed flights and customer service – that trigger passengers to express their frustrations on these sites, because these complaints posted publicly put brand reputation at risk of being tarnished. One could review the compensations offered for passengers with delayed flights, or go as far as the root of the problem, if it can be solved within the airlines. Customer service could be improved by administering refresher trainings to both ground staff and flight attendants.

While these findings could not be set aside, the airlines could also capitalize on its strengths, which are rarely long lines during the onboarding process and almost zero percent damaged luggages of passengers.

Aside from branding, such sentiments are gathered and utilized by Customer Experience Designers to improve their services and products. By further analyzing such information into actionable insights, airlines can decide to make changes to feature, functions, or people to correct errors and enhance the experience of their patrons and target market.


Carry on a conversation with us through the comments below, or you can send a message or inquiry about our Data Management and Analytics, among other services we specialize in for pertinent industries.

Founded in 2003 by pioneers of the Philippine Global Sourcing industry, Pointwest creates value for its list of satisfied clients — including top Fortune 100 and local companies — with world-class IT and BPM services backed by international-standards methodologies and innovative practices.