A multivariate data analysis approach for investigating daily statistics of countries affected with COVID-19 pandemic

Heliyon. 2020 Nov;6(11):e05575. doi: 10.1016/j.heliyon.2020.e05575. Epub 2020 Nov 24.


Background: To understand the impact and volume of coronavirus (COVID-19) crisis, univariate analysis is tedious for describing the datasets reported daily. However, to capture the full picture and be able to compare situations and consequences for different countries, multivariate analytical models are suggested in order to visualize and compare the situation of different countries more accurately and precisely.

Aims: We aimed to utilize data analysis tools that display the relative positions of data points in fewer dimensions while keeping the variation of the original data set as much as possible, and cluster countries according to their scores on the formed dimensions.

Methods: Principal component analysis (PCA) and Partitioning around medoids (PAM) clustering algorithms were used to analyze data of 56 countries, 82 countries and 91 countries with COVID-19 at three time points, eligible countries included in the analysis are those with total cases of 500 or more with no missing data.

Results: After performing PCA, we generated two scores: Disease Magnitude score that represents total cases, total deaths, total actives cases, and critically ill cases, and Mortality Recovery Ratio score that represents the ratio between total deaths to total recoveries in any given country.

Conclusion: Accurate multivariate analyses can be of great value as they can simplify difficult concepts, explore and communicate findings from health datasets, and support the decision-making process.

Keywords: Applied mathematics; COVID-19; Clustering; Multivariate; PAM; PCA; Virology; Visualization.