COVID-19 outbreak: An ensemble pre-trained deep learning model for detecting informative tweets

SreeJagadeesh Malla; Alphonse P J A

doi:10.1016/j.asoc.2021.107495

COVID-19 outbreak: An ensemble pre-trained deep learning model for detecting informative tweets

Appl Soft Comput. 2021 Aug:107:107495. doi: 10.1016/j.asoc.2021.107495. Epub 2021 May 21.

Authors

SreeJagadeesh Malla¹, Alphonse P J A¹

Affiliation

¹ Department of Computer Applications, National Institute of Technology, Tiruchirappalli, 620015, India.

Abstract

On 11 March 2020, the (WHO) World Health Organization declared COVID-19 (CoronaVirus Disease 2019) as a pandemic. A further crisis has manifested mass fear and panic, driven by lack of information, or sometimes outright misinformation, alongside the coronavirus pandemic. Twitter is one of the prominent and trusted social media in this current outbreak. Over time, boundless COVID-19 headlines and vast awareness have been spreading, with tweets, updates, videos, and explosive posts. Few studies have been performed on the pandemic to detect and interrelate various disease types, including current coronavirus. However, it is pretty tricky to discriminate and detect a specific category. This work is motivated by the need to inform society about limiting irrelevant information and avoiding spreading negative emotions. In this context, the current work focuses on informative tweet detection in the pandemic to provide relevant information to the government, medical organizations, victims services, etc. This paper used a Majority Voting technique-based Ensemble Deep Learning (MVEDL) model. This MVEDL model is used to identify COVID-19 related (INFORMATIVE) tweets. The state-of-art deep learning models RoBERTa, BERTweet, and CT-BERT are used for best performance with the MVEDL model. The "COVID-19 English labeled tweets" dataset is used for training and testing the MVEDL model. The MVEDL model has shown 91.75 percent accuracy, 91.14 percent F1-score and outperforms the traditional machine learning and deep learning models. We also investigate how to use the MVEDL model for sentiment analysis on 226668 unlabeled COVID-19 tweets and their informative tweets. The application section discussed a comprehensive analysis of both actual and informative tweets. According to our knowledge, this is the first work on COVID-19 sentiment analysis using a deep learning ensemble model.

Keywords: BERTweet; COVID-19; CT-BERT; Deep learning; Health emergency; Informative tweets; Majority voting; RoBERTa; Sentiment analysis.