A machine learning-based universal outbreak risk prediction tool

Comput Biol Med. 2024 Feb:169:107876. doi: 10.1016/j.compbiomed.2023.107876. Epub 2023 Dec 24.

Abstract

In order to prevent and control the increasing number of serious epidemics, the ability to predict the risk caused by emerging outbreaks is essential. However, most current risk prediction tools, except EPIRISK, are limited by being designed for targeting only one specific disease and one country. Differences between countries and diseases (e.g., different economic conditions, different modes of transmission, etc.) pose challenges for building models with cross-country and cross-disease prediction capabilities. The limitation of universality affects domestic and international efforts to control and prevent pandemic outbreaks. To address this problem, we used outbreak data from 43 diseases in 206 countries to develop a universal risk prediction system that can be used across countries and diseases. This system used five machine learning models (including Neural Network XGBoost, Logistic Boost, Random Forest and Kernel SVM) to predict and vote together to make ensemble predictions. It can make predictions with around 80%-90 % accuracy from economic, cultural, social, and epidemiological factors. Three different datasets were designed to test the performance of ML models under different realistic situations. This prediction system has strong predictive ability, adaptability, and generality. It can give universal outbreak risk assessment that are not limited by border or disease type, facilitate rapid response to pandemic outbreaks, government decision-making and international cooperation.

Keywords: Epidemics; Machine learning; Outbreak risk prediction; Public health.

MeSH terms

  • Disease Outbreaks*
  • Machine Learning
  • Neural Networks, Computer*
  • Pandemics
  • Support Vector Machine