Explorative data analysis techniques and unsupervised clustering methods to support clinical assessment of Chronic Obstructive Pulmonary Disease (COPD) phenotypes

J Biomed Inform. 2009 Dec;42(6):1013-21. doi: 10.1016/j.jbi.2009.05.008. Epub 2009 Jun 6.


Chronic Obstructive Pulmonary Disease (COPD) is the fourth leading cause of death worldwide and represents one of the major causes of chronic morbidity. Cigarette smoking is the most important risk factor for COPD. In these patients, the airflow limitation is caused by a mixture of small airways disease and parenchyma destruction, the relative contribution of which varies from person to person. The twofold nature of the pathology has been studied in the past and according to some authors each patient should be classified as presenting a predominantly bronchial or emphysematous phenotype. In this study we applied various explorative analysis techniques (PCA, MCA, MDS) and recent unsupervised clustering methods (KHM) to study a large dataset, acquired from 415 COPD patients, to assess the presence of hidden structures in data corresponding to the different COPD phenotypes observed in clinical practice. In order to validate our methods, we compared the results obtained from a training set of 415 patients with lung density data acquired in a test set of 93 patients who underwent HRCT (High Resolution Computerized Tomography).

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Artificial Intelligence
  • Cluster Analysis*
  • Computational Biology / methods*
  • Databases, Factual*
  • Humans
  • Medical Informatics / methods*
  • Phenotype
  • Principal Component Analysis
  • Pulmonary Disease, Chronic Obstructive / metabolism
  • Pulmonary Disease, Chronic Obstructive / pathology*
  • Pulmonary Disease, Chronic Obstructive / physiopathology
  • Reproducibility of Results