Construction and validation of a machine learning-based nomogram: A tool to predict the risk of getting severe coronavirus disease 2019 (COVID-19)

Immun Inflamm Dis. 2021 Jun;9(2):595-607. doi: 10.1002/iid3.421. Epub 2021 Mar 13.

Abstract

Background: Identifying patients who may develop severe coronavirus disease 2019 (COVID-19) will facilitate personalized treatment and optimize the distribution of medical resources.

Methods: In this study, 590 COVID-19 patients during hospitalization were enrolled (Training set: n = 285; Internal validation set: n = 127; Prospective set: n = 178). After filtered by two machine learning methods in the training set, 5 out of 31 clinical features were selected into the model building to predict the risk of developing severe COVID-19 disease. Multivariate logistic regression was applied to build the prediction nomogram and validated in two different sets. Receiver operating characteristic (ROC) analysis and decision curve analysis (DCA) were used to evaluate its performance.

Results: From 31 potential predictors in the training set, 5 independent predictive factors were identified and included in the risk score: C-reactive protein (CRP), lactate dehydrogenase (LDH), Age, Charlson/Deyo comorbidity score (CDCS), and erythrocyte sedimentation rate (ESR). Subsequently, we generated the nomogram based on the above features for predicting severe COVID-19. In the training cohort, the area under curves (AUCs) were 0.822 (95% CI, 0.765-0.875) and the internal validation cohort was 0.762 (95% CI, 0.768-0.844). Further, we validated it in a prospective cohort with the AUCs of 0.705 (95% CI, 0.627-0.778). The internally bootstrapped calibration curve showed favorable consistency between prediction by nomogram and the actual situation. And DCA analysis also conferred high clinical net benefit.

Conclusion: In this study, our predicting model based on five clinical characteristics of COVID-19 patients will enable clinicians to predict the potential risk of developing critical illness and thus optimize medical management.

Keywords: COVID-19; machine learning; nomogram; severe COVID-19 prediction.

Publication types

  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Adult
  • Aged
  • Area Under Curve
  • COVID-19 / epidemiology*
  • Calibration
  • Decision Support Techniques
  • Female
  • Humans
  • Logistic Models
  • Machine Learning*
  • Male
  • Middle Aged
  • Models, Theoretical*
  • Nomograms*
  • Pandemics*
  • Prospective Studies
  • ROC Curve
  • Retrospective Studies
  • Risk Assessment
  • Risk Factors
  • SARS-CoV-2*
  • Sensitivity and Specificity