Integrating genetic and gene expression evidence into genome-wide association analysis of gene sets

Genome Res. 2012 Feb;22(2):386-97. doi: 10.1101/gr.124370.111. Epub 2011 Sep 22.


Single variant or single gene analyses generally account for only a small proportion of the phenotypic variation in complex traits. Alternatively, gene set or pathway association analyses are playing an increasingly important role in uncovering genetic architectures of complex traits through the identification of systematic genetic interactions. Two dominant paradigms for gene set analyses are association analyses based on SNP genotypes and those based on gene expression profiles. However, gene-disease association can manifest in many ways, such as alterations of gene expression, genotype, and copy number; thus, an integrative approach combining multiple forms of evidence can more accurately and comprehensively capture pathway associations. We have developed a single statistical framework, Gene Set Association Analysis (GSAA), that simultaneously measures genome-wide patterns of genetic variation and gene expression variation to identify sets of genes enriched for differential expression and/or trait-associated genetic markers. Simulation studies illustrate that joint analyses of genomic data increase the power to detect real associations when compared with gene set methods that use only one genomic data type. The analysis of two human diseases, glioblastoma and Crohn's disease, detected abnormalities in previously identified disease-associated pathways, such as pathways related to PI3K signaling, DNA damage response, and the activation of NFKB. In addition, GSAA predicted novel pathway associations, for example, differential genetic and expression characteristics in genes from the ABC transporter family in glioblastoma and from the HLA system in Crohn's disease. These demonstrate that GSAA can help uncover biological pathways underlying human diseases and complex traits.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods
  • Computer Simulation
  • Crohn Disease / genetics
  • Gene Expression Profiling*
  • Genetic Predisposition to Disease
  • Genome-Wide Association Study*
  • Genomics
  • Humans
  • Models, Genetic
  • Neoplasms / genetics
  • Polymorphism, Single Nucleotide*
  • Signal Transduction