Employing gene set top scoring pairs to identify deregulated pathway-signatures in dilated cardiomyopathy from integrated microarray gene expression data

Methods Mol Biol. 2012:802:345-61. doi: 10.1007/978-1-61779-400-1_23.


It is well accepted that a set of genes must act in concert to drive various cellular processes. However, under different biological phenotypes, not all the members of a gene set will participate in a biological process. Hence, it is useful to construct a discriminative classifier by focusing on the core members (subset) of a highly informative gene set. Such analyses can reveal which of those subsets from the same gene set correspond to different biological phenotypes. In this study, we propose Gene Set Top Scoring Pairs (GSTSP) approach that exploits the simple yet powerful relative expression reversal concept at the gene set levels to achieve these goals. To illustrate the usefulness of GSTSP, we applied this method to five different human heart failure gene expression data sets. We take advantage of the direct data integration feature in the GSTSP approach to combine two data sets, identify a discriminative gene set from >190 predefined gene sets, and evaluate the predictive power of the GSTSP classifier derived from this informative gene set on three independent test sets (79.31% in test accuracy). The discriminative gene pairs identified in this study may provide new biological understanding on the disturbed pathways that are involved in the development of heart failure. GSTSP methodology is general in purpose and is applicable to a variety of phenotypic classification problems using gene expression data.

MeSH terms

  • Algorithms*
  • Cardiomyopathy, Dilated / genetics*
  • Cluster Analysis
  • Computational Biology / methods
  • Computer Simulation
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation*
  • Humans
  • Oligonucleotide Array Sequence Analysis / methods*
  • Reproducibility of Results
  • Signal Transduction*