Predicting Severe Chronic Obstructive Pulmonary Disease Exacerbations. Developing a Population Surveillance Approach with Administrative Data

Ann Am Thorac Soc. 2020 Sep;17(9):1069-1076. doi: 10.1513/AnnalsATS.202001-070OC.


Rationale: Automatic prediction algorithms based on routinely collected health data may be able to identify patients at high risk for hospitalizations related to acute exacerbations of chronic obstructive pulmonary disease (COPD).Objectives: To conduct a proof-of-concept study of a population surveillance approach for identifying individuals at high risk of severe COPD exacerbations.Methods: We used British Columbia's administrative health databases (1997-2016) to identify patients with diagnosed COPD. We used data from the previous 6 months to predict the risk of severe exacerbation in the next 2 months after a randomly selected index date. We applied statistical and machine-learning algorithms for risk prediction (logistic regression, random forest, neural network, and gradient boosting). We used calibration plots and receiver operating characteristic curves to evaluate model performance based on a randomly chosen future date at least 1 year later (temporal validation).Results: There were 108,433 patients in the development dataset and 113,786 in the validation dataset; of these, 1,126 and 1,136, respectively, were hospitalized for COPD within their outcome windows. The best prediction algorithm (gradient boosting) had an area under the receiver operating characteristic curve of 0.82 (95% confidence interval, 0.80-0.83), which was significantly higher than the corresponding value for the model with exacerbation history as the only predictor (current standard of care: 0.68). The predicted risk scores were well calibrated in the validation dataset.Conclusions: Imminent COPD-related hospitalizations can be predicted with good accuracy using administrative health data. This model may be used as a means to target high-risk patients for preventive exacerbation therapies.

Keywords: big data; chronic obstructive pulmonary disease; machine learning; population surveillance; risk prediction.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aged
  • Aged, 80 and over
  • British Columbia / epidemiology
  • Disease Progression
  • Female
  • Hospitalization
  • Humans
  • Logistic Models
  • Machine Learning
  • Male
  • Middle Aged
  • Population Surveillance
  • Pulmonary Disease, Chronic Obstructive / diagnosis*
  • Pulmonary Disease, Chronic Obstructive / epidemiology*
  • Pulmonary Disease, Chronic Obstructive / therapy*
  • ROC Curve