Clinical Characteristics and Prognostic Factors for Intensive Care Unit Admission of Patients With COVID-19: Retrospective Study Using Machine Learning and Natural Language Processing

J Med Internet Res. 2020 Oct 28;22(10):e21801. doi: 10.2196/21801.


Background: Many factors involved in the onset and clinical course of the ongoing COVID-19 pandemic are still unknown. Although big data analytics and artificial intelligence are widely used in the realms of health and medicine, researchers are only beginning to use these tools to explore the clinical characteristics and predictive factors of patients with COVID-19.

Objective: Our primary objectives are to describe the clinical characteristics and determine the factors that predict intensive care unit (ICU) admission of patients with COVID-19. Determining these factors using a well-defined population can increase our understanding of the real-world epidemiology of the disease.

Methods: We used a combination of classic epidemiological methods, natural language processing (NLP), and machine learning (for predictive modeling) to analyze the electronic health records (EHRs) of patients with COVID-19. We explored the unstructured free text in the EHRs within the Servicio de Salud de Castilla-La Mancha (SESCAM) Health Care Network (Castilla-La Mancha, Spain) from the entire population with available EHRs (1,364,924 patients) from January 1 to March 29, 2020. We extracted related clinical information regarding diagnosis, progression, and outcome for all COVID-19 cases.

Results: A total of 10,504 patients with a clinical or polymerase chain reaction-confirmed diagnosis of COVID-19 were identified; 5519 (52.5%) were male, with a mean age of 58.2 years (SD 19.7). Upon admission, the most common symptoms were cough, fever, and dyspnea; however, all three symptoms occurred in fewer than half of the cases. Overall, 6.1% (83/1353) of hospitalized patients required ICU admission. Using a machine-learning, data-driven algorithm, we identified that a combination of age, fever, and tachypnea was the most parsimonious predictor of ICU admission; patients younger than 56 years, without tachypnea, and temperature <39 degrees Celsius (or >39 ºC without respiratory crackles) were not admitted to the ICU. In contrast, patients with COVID-19 aged 40 to 79 years were likely to be admitted to the ICU if they had tachypnea and delayed their visit to the emergency department after being seen in primary care.

Conclusions: Our results show that a combination of easily obtainable clinical variables (age, fever, and tachypnea with or without respiratory crackles) predicts whether patients with COVID-19 will require ICU admission.

Keywords: COVID-19; SARS-CoV-2; artificial intelligence; big data; electronic health records; predictive model; tachypnea.

MeSH terms

  • Adult
  • Aged
  • Betacoronavirus
  • COVID-19
  • Coronavirus Infections / diagnosis*
  • Coronavirus Infections / epidemiology
  • Coronavirus Infections / therapy
  • Electronic Health Records / statistics & numerical data*
  • Emergency Service, Hospital
  • Female
  • Hospitalization / statistics & numerical data*
  • Humans
  • Intensive Care Units / statistics & numerical data*
  • Machine Learning*
  • Male
  • Middle Aged
  • Natural Language Processing*
  • Pandemics
  • Pneumonia, Viral / diagnosis*
  • Pneumonia, Viral / epidemiology
  • Pneumonia, Viral / therapy
  • Prognosis
  • Retrospective Studies
  • SARS-CoV-2
  • Spain / epidemiology
  • Treatment Outcome