Rationale: Automatic prediction algorithms based on routinely collected health data may be able to identify patients at high risk for hospitalizations related to acute exacerbations of chronic obstructive pulmonary disease (COPD).Objectives: To conduct a proof-of-concept study of a population surveillance approach for identifying individuals at high risk of severe COPD exacerbations.Methods: We used British Columbia's administrative health databases (1997-2016) to identify patients with diagnosed COPD. We used data from the previous 6 months to predict the risk of severe exacerbation in the next 2 months after a randomly selected index date. We applied statistical and machine-learning algorithms for risk prediction (logistic regression, random forest, neural network, and gradient boosting). We used calibration plots and receiver operating characteristic curves to evaluate model performance based on a randomly chosen future date at least 1 year later (temporal validation).Results: There were 108,433 patients in the development dataset and 113,786 in the validation dataset; of these, 1,126 and 1,136, respectively, were hospitalized for COPD within their outcome windows. The best prediction algorithm (gradient boosting) had an area under the receiver operating characteristic curve of 0.82 (95% confidence interval, 0.80-0.83), which was significantly higher than the corresponding value for the model with exacerbation history as the only predictor (current standard of care: 0.68). The predicted risk scores were well calibrated in the validation dataset.Conclusions: Imminent COPD-related hospitalizations can be predicted with good accuracy using administrative health data. This model may be used as a means to target high-risk patients for preventive exacerbation therapies.
Keywords: big data; chronic obstructive pulmonary disease; machine learning; population surveillance; risk prediction.