An Ensemble Deep Learning Model for Drug Abuse Detection in Sparse Twitter-Sphere

Han Hu; NhatHai Phan; James Geller; Stephen Iezzi; Huy Vo; Dejing Dou; Soon Ae Chun

doi:10.3233/SHTI190204

An Ensemble Deep Learning Model for Drug Abuse Detection in Sparse Twitter-Sphere

Stud Health Technol Inform. 2019 Aug 21:264:163-167. doi: 10.3233/SHTI190204.

Authors

Han Hu¹, NhatHai Phan¹, James Geller¹, Stephen Iezzi¹, Huy Vo², Dejing Dou³, Soon Ae Chun⁴

Affiliations

¹ Ying Wu College of Computing, New Jersey Institute of Technology, Newark, NJ, USA.
² Department of Computer Science, The City College of New York, New York, NY, USA.
³ Computer and Information Science, University of Oregon, Eugene, OR, USA.
⁴ Information Systems & Informatics, City University of New York, Staten Island, NY, USA.

PMID: 31437906
DOI: 10.3233/SHTI190204

Abstract

As the problem of drug abuse intensifies in the U.S., many studies that primarily utilize social media data, such as postings on Twitter, to study drug abuse-related activities use machine learning as a powerful tool for text classification and filtering. However, given the wide range of topics of Twitter users, tweets related to drug abuse are rare in most of the datasets. This imbalanced data remains a major issue in building effective tweet classifiers, and is especially obvious for studies that include abuse-related slang terms. In this study, we approach this problem by designing an ensemble deep learning model that leverages both word-level and character-level features to classify abuse-related tweets. Experiments are reported on a Twitter dataset, where we can configure the percentages of the two classes (abuse vs. non abuse) to simulate the data imbalance with different amplitudes. Results show that our ensemble deep learning models exhibit better performance than ensembles of traditional machine learning models, especially on heavily imbalanced datasets.

Keywords: Machine Learning; Social Media; Substance-Related Disorders.

MeSH terms

Data Collection
Deep Learning
Machine Learning
Social Media*
Substance Abuse Detection