Background: The increasing popularity of hookah (or waterpipe) use in the United States and elsewhere has consequences for public health because it has similar health risks to that of combustible cigarettes. While hookah use rapidly increases in popularity, social media data (Twitter, Instagram) can be used to capture and describe the social and environmental contexts in which individuals use, perceive, discuss, and are marketed this tobacco product. These data may allow people to organically report on their sentiment toward tobacco products like hookah unprimed by a researcher, without instrument bias, and at low costs.
Objective: This study describes the sentiment of hookah-related posts on Twitter and describes the importance of debiasing Twitter data when attempting to understand attitudes.
Methods: Hookah-related posts on Twitter (N=986,320) were collected from March 24, 2015, to December 2, 2016. Machine learning models were used to describe sentiment on 20 different emotions and to debias the data so that Twitter posts reflected sentiment of legitimate human users and not of social bots or marketing-oriented accounts that would possibly provide overly positive or overly negative sentiment of hookah.
Results: From the analytical sample, 352,116 tweets (59.50%) were classified as positive while 177,537 (30.00%) were classified as negative, and 62,139 (10.50%) neutral. Among all positive tweets, 218,312 (62.00%) were classified as highly positive emotions (eg, active, alert, excited, elated, happy, and pleasant), while 133,804 (38.00%) positive tweets were classified as passive positive emotions (eg, contented, serene, calm, relaxed, and subdued). Among all negative tweets, 95,870 (54.00%) were classified as subdued negative emotions (eg, sad, unhappy, depressed, and bored) while the remaining 81,667 (46.00%) negative tweets were classified as highly negative emotions (eg, tense, nervous, stressed, upset, and unpleasant). Sentiment changed drastically when comparing a corpus of tweets with social bots to one without. For example, the probability of any one tweet reflecting joy was 61.30% from the debiased (or bot free) corpus of tweets. In contrast, the probability of any one tweet reflecting joy was 16.40% from the biased corpus.
Conclusions: Social media data provide researchers the ability to understand public sentiment and attitudes by listening to what people are saying in their own words. Tobacco control programmers in charge of risk communication may consider targeting individuals posting positive messages about hookah on Twitter or designing messages that amplify the negative sentiments. Posts on Twitter communicating positive sentiment toward hookah could add to the normalization of hookah use and is an area of future research. Findings from this study demonstrated the importance of debiasing data when attempting to understand attitudes from Twitter data.
Keywords: Twitter; big data; bots; hookah; sentiment; social media; waterpipe.
©Jon-Patrick Allem, Jagannathan Ramanujam, Kristina Lerman, Kar-Hai Chu, Tess Boley Cruz, Jennifer B Unger. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 18.10.2017.