Insights into colon cancer etiology via a regularized approach to gene set analysis of GWAS data

Am J Hum Genet. 2010 Jun 11;86(6):860-71. doi: 10.1016/j.ajhg.2010.04.014.


Genome-wide association studies (GWAS) have successfully identified susceptibility loci from marginal association analysis of SNPs. Valuable insight into genetic variation underlying complex diseases will likely be gained by considering functionally related sets of genes simultaneously. One approach is to further develop gene set enrichment analysis methods, which are initiated in gene expression studies, to account for the distinctive features of GWAS data. These features include the large number of SNPs per gene, the modest and sparse SNP associations, and the additional information provided by linkage disequilibrium (LD) patterns within genes. We propose a "gene set ridge regression in association studies (GRASS)" algorithm. GRASS summarizes the genetic structure for each gene as eigenSNPs and uses a novel form of regularized regression technique, termed group ridge regression, to select representative eigenSNPs for each gene and assess their joint association with disease risk. Compared with existing methods, the proposed algorithm greatly reduces the high dimensionality of GWAS data while still accounting for multiple hits and/or LD in the same gene. We show by simulation that this algorithm performs well in situations in which there are a large number of predictors compared to sample size. We applied the GRASS algorithm to a genome-wide association study of colon cancer and identified nicotinate and nicotinamide metabolism and transforming growth factor beta signaling as the top two significantly enriched pathways. Elucidating the role of variation in these pathways may enhance our understanding of colon cancer etiology.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Colonic Neoplasms / genetics
  • Genome-Wide Association Study / methods*
  • Humans
  • Polymorphism, Single Nucleotide
  • Principal Component Analysis
  • Regression Analysis

Grant support