Machine learning-driven development of a disease risk score for COVID-19 hospitalization and mortality: a Swedish and Norwegian register-based study

Front Public Health. 2023 Dec 7:11:1258840. doi: 10.3389/fpubh.2023.1258840. eCollection 2023.


Aims: To develop a disease risk score for COVID-19-related hospitalization and mortality in Sweden and externally validate it in Norway.

Method: We employed linked data from the national health registries of Sweden and Norway to conduct our study. We focused on individuals in Sweden with confirmed SARS-CoV-2 infection through RT-PCR testing up to August 2022 as our study cohort. Within this group, we identified hospitalized cases as those who were admitted to the hospital within 14 days of testing positive for SARS-CoV-2 and matched them with five controls from the same cohort who were not hospitalized due to SARS-CoV-2. Additionally, we identified individuals who died within 30 days after being hospitalized for COVID-19. To develop our disease risk scores, we considered various factors, including demographics, infectious, somatic, and mental health conditions, recorded diagnoses, and pharmacological treatments. We also conducted age-specific analyses and assessed model performance through 5-fold cross-validation. Finally, we performed external validation using data from the Norwegian population with COVID-19 up to December 2021.

Results: During the study period, a total of 124,560 individuals in Sweden were hospitalized, and 15,877 individuals died within 30 days following COVID-19 hospitalization. Disease risk scores for both hospitalization and mortality demonstrated predictive capabilities with ROC-AUC values of 0.70 and 0.72, respectively, across the entire study period. Notably, these scores exhibited a positive correlation with the likelihood of hospitalization or death. In the external validation using data from the Norwegian COVID-19 population (consisting of 53,744 individuals), the disease risk score predicted hospitalization with an AUC of 0.47 and death with an AUC of 0.74.

Conclusion: The disease risk score showed moderately good performance to predict COVID-19-related mortality but performed poorly in predicting hospitalization when externally validated.

Keywords: COVID-19; artificial intelligence; disease risk score; machine learning; prediction modeling.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • COVID-19* / epidemiology
  • Hospitalization
  • Humans
  • Machine Learning
  • Risk Factors
  • SARS-CoV-2
  • Sweden / epidemiology

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work was performed as part of the Nordic COHERENCE project, project no. 105670 funded by NordForsk under the Nordic Council of Ministers and the EU-COVID-19 project, project no. 312707 funded by the Norwegian Research Council’s COVID-19 Emergency Call. The Pharmacovigilance Research Center was supported by a grant from the Novo Nordisk Foundation to the University of Copenhagen (NNF15SA0018404). The SCIFI-PEARL project which supplies the data for the Swedish part of this analysis has basic funding based on grants from the Swedish state under the agreement between the Swedish government and the county councils, the ALF-agreement (Avtal om Läkarutbildning och Forskning/Medical Training and Research Agreement) grants ALFGBG-938453, ALFGBG-971130, ALFGBG-978954 and previously from a joint grant from Forte (Swedish Research Council for Health, Working Life and Welfare) and FORMAS (Forskningsrådet för miljö, areella näringar och samhällsbyggande), grant 2020-02828. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.