Machine Learning Characterization of COPD Subtypes: Insights From the COPDGene Study

Chest. 2020 May;157(5):1147-1157. doi: 10.1016/j.chest.2019.11.039. Epub 2019 Dec 28.


COPD is a heterogeneous syndrome. Many COPD subtypes have been proposed, but there is not yet consensus on how many COPD subtypes there are and how they should be defined. The COPD Genetic Epidemiology Study (COPDGene), which has generated 10-year longitudinal chest imaging, spirometry, and molecular data, is a rich resource for relating COPD phenotypes to underlying genetic and molecular mechanisms. In this article, we place COPDGene clustering studies in context with other highly cited COPD clustering studies, and summarize the main COPD subtype findings from COPDGene. First, most manifestations of COPD occur along a continuum, which explains why continuous aspects of COPD or disease axes may be more accurate and reproducible than subtypes identified through clustering methods. Second, continuous COPD-related measures can be used to create subgroups through the use of predictive models to define cut-points, and we review COPDGene research on blood eosinophil count thresholds as a specific example. Third, COPD phenotypes identified or prioritized through machine learning methods have led to novel biological discoveries, including novel emphysema genetic risk variants and systemic inflammatory subtypes of COPD. Fourth, trajectory-based COPD subtyping captures differences in the longitudinal evolution of COPD, addressing a major limitation of clustering analyses that are confounded by disease severity. Ongoing longitudinal characterization of subjects in COPDGene will provide useful insights about the relationship between lung imaging parameters, molecular markers, and COPD progression that will enable the identification of subtypes based on underlying disease processes and distinct patterns of disease progression, with the potential to improve the clinical relevance and reproducibility of COPD subtypes.

Keywords: COPD; emphysema; machine learning.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Cluster Analysis
  • Diagnostic Imaging
  • Disease Progression
  • Genetic Predisposition to Disease
  • Genome-Wide Association Study
  • Humans
  • Machine Learning*
  • Molecular Epidemiology*
  • Phenotype
  • Pulmonary Disease, Chronic Obstructive / classification*
  • Pulmonary Disease, Chronic Obstructive / epidemiology*
  • Pulmonary Disease, Chronic Obstructive / genetics*
  • Respiratory Function Tests