Early prediction of mortality risk among patients with severe COVID-19, using machine learning

Int J Epidemiol. 2021 Jan 23;49(6):1918-1929. doi: 10.1093/ije/dyaa171.


Background: Coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 infection, has been spreading globally. We aimed to develop a clinical model to predict the outcome of patients with severe COVID-19 infection early.

Methods: Demographic, clinical and first laboratory findings after admission of 183 patients with severe COVID-19 infection (115 survivors and 68 non-survivors from the Sino-French New City Branch of Tongji Hospital, Wuhan) were used to develop the predictive models. Machine learning approaches were used to select the features and predict the patients' outcomes. The area under the receiver operating characteristic curve (AUROC) was applied to compare the models' performance. A total of 64 with severe COVID-19 infection from the Optical Valley Branch of Tongji Hospital, Wuhan, were used to externally validate the final predictive model.

Results: The baseline characteristics and laboratory tests were significantly different between the survivors and non-survivors. Four variables (age, high-sensitivity C-reactive protein level, lymphocyte count and d-dimer level) were selected by all five models. Given the similar performance among the models, the logistic regression model was selected as the final predictive model because of its simplicity and interpretability. The AUROCs of the external validation sets were 0.881. The sensitivity and specificity were 0.839 and 0.794 for the validation set, when using a probability of death of 50% as the cutoff. Risk score based on the selected variables can be used to assess the mortality risk. The predictive model is available at [https://phenomics.fudan.edu.cn/risk_scores/].

Conclusions: Age, high-sensitivity C-reactive protein level, lymphocyte count and d-dimer level of COVID-19 patients at admission are informative for the patients' outcomes.

Keywords: COVID-19; death; fatality rate; machine learning; predictive model.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aged
  • COVID-19 / diagnosis*
  • COVID-19 / mortality*
  • Case-Control Studies
  • Female
  • Hospitalization / statistics & numerical data
  • Hospitals
  • Humans
  • Machine Learning / standards*
  • Male
  • Middle Aged
  • Patient Admission / statistics & numerical data*
  • ROC Curve
  • Risk Assessment / methods
  • Risk Assessment / standards
  • SARS-CoV-2*
  • Sensitivity and Specificity