Unsupervised gene expression analyses identify IPF-severity correlated signatures, associated genes and biomarkers

BMC Pulm Med. 2017 Oct 20;17(1):133. doi: 10.1186/s12890-017-0472-9.


Background: Idiopathic Pulmonary Fibrosis (IPF) is a fatal fibrotic lung disease occurring predominantly in middle-aged and older adults. The traditional diagnostic classification of IPF is based on clinical, radiological, and histopathological features. However, the considerable heterogeneity in IPF presentation suggests that differences in gene expression profiles can help to characterize and distinguish disease severity.

Methods: We used data-driven unsupervised clustering analysis, combined with a knowledge-based approach to identify and characterize IPF subgroups.

Results: Using transcriptional profiles on lung tissue from 131 patients with IPF/UIP and 12 non-diseased controls, we identified six subgroups of IPF that generally correlated with the disease severity and lung function decline. Network-informed clustering identified the most severe subgroup of IPF that was enriched with genes regulating inflammatory processes, blood pressure and branching morphogenesis of the lung. The differentially expressed genes in six subgroups of IPF compared to healthy control include transcripts of extracellular matrix, epithelial-mesenchymal cell cross-talk, calcium ion homeostasis, and oxygen transport. Further, we compiled differentially expressed gene signatures to identify unique gene clusters that can segregate IPF from normal, and severe from mild IPF. Additional validations of these signatures were carried out in three independent cohorts of IPF/UIP. Finally, using knowledge-based approaches, we identified several novel candidate genes which may also serve as potential biomarkers of IPF.

Conclusions: Discovery of unique and redundant gene signatures for subgroups in IPF can be greatly facilitated through unsupervised clustering. Findings derived from such gene signatures may provide insights into pathogenesis of IPF and facilitate the development of clinically useful biomarkers.

Keywords: Gene expression analysis; Gene signature; IPF subtyping; Idiopathic pulmonary fibrosis; Ipf.

MeSH terms

  • Aged
  • Biomarkers / blood*
  • Case-Control Studies
  • Cluster Analysis
  • Female
  • Humans
  • Idiopathic Pulmonary Fibrosis / diagnosis*
  • Idiopathic Pulmonary Fibrosis / genetics*
  • Logistic Models
  • Lung / pathology
  • Male
  • Middle Aged
  • Oligonucleotide Array Sequence Analysis
  • Transcriptome*


  • Biomarkers