Prediction of COVID-19 using long short-term memory by integrating principal component analysis and clustering techniques

Inform Med Unlocked. 2022:31:100990. doi: 10.1016/j.imu.2022.100990. Epub 2022 Jun 3.

Abstract

Severe acute respiratory syndrome coronavirus (SARS-COV) is a major family of viruses that cause infections in both animals and humans, including common cold, coronavirus disease (COVID-19), severe acute respiratory syndrome (SARS), and Middle East respiratory syndrome. This study primarily aims to predict the number of COVID-19 positive cases in 36 states of Nigeria using a long short-term memory (LSTM) algorithm of deep learning. The proposed approach employs K-means clustering to detect outliers and principal component analysis (PCA) to select important features from the dataset. The LSTM was chosen because of its non-linear characteristics to handle the dataset. As COVID-19 cases follow non-linear characteristics, LSTM is the most suitable algorithm for predicting their numbers. For comparison, several types of machine learning algorithms, such as naive Bayes, XG-boost, and SVM, were employed. After the comparison, LSTM was observed to be superior among all algorithms.

Keywords: COVID-19; Classification and clustering; LSTM; Machine learning.