Gene, region and pathway level analyses in whole-genome studies

Genet Epidemiol. 2010 Apr;34(3):222-231. doi: 10.1002/gepi.20452.


In the setting of genome-wide association studies, we propose a method for assigning a measure of significance to pre-defined sets of markers in the genome. The sets can be genes, conserved regions, or groups of genes such as pathways. Using the proposed methods and algorithms, evidence for association between a particular functional unit and a disease status can be obtained not just by the presence of a strong signal from a SNP within it, but also by the combination of several simultaneous weaker signals that are not strongly correlated. This approach has several advantages. First, moderately strong signals from different SNPs are combined to obtain a much stronger signal for the set, therefore increasing power. Second, in combination with methods that provide information on untyped markers, it leads to results that can be readily combined across studies and platforms that might use different SNPs. Third, the results are easy to interpret, since they refer to functional sets of markers that are likely to behave as a unit in their phenotypic effect. Finally, the availability of gene-level P-values for association is the first step in developing methods that integrate information from pathways and networks with genome-wide association data, and these can lead to a better understanding of the complex traits genetic architecture. The power of the approach is investigated in simulated and real datasets. Novel Crohn's disease associations are found using the WTCCC data.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Computer Simulation
  • Crohn Disease / genetics
  • Genome, Human
  • Genome-Wide Association Study / methods*
  • Genotype
  • Humans
  • Models, Genetic
  • Models, Statistical
  • Molecular Epidemiology
  • Phenotype
  • Polymorphism, Single Nucleotide*