Assessing gene length biases in gene set analysis of Genome-Wide Association Studies

Int J Comput Biol Drug Des. 2010;3(4):297-310. doi: 10.1504/IJCBDD.2010.038394. Epub 2011 Feb 4.


Genome-Wide Association Studies (GWAS) have rapidly become a major genetics approach to studying complex diseases. Although many susceptibility variants and genes have been uncovered by single marker analysis, gene set based analysis is emerging as a very promising approach aiming to detect joint association of a set of genes with disease. In the available gene set based methods, it is often the smallest P value of the Single Nucleotide Polymorphisms (SNPs) in a gene region is used to represent the gene-level association signal. This approach may introduce strong bias of association signal towards long genes. In this study, we propose a resampling strategy by randomly generating genomic intervals across the accessible genomic region to estimate the background distribution of P values at the gene level. Comparing with the gene-wise P value in real data, the proportion of random intervals could be used to assess the bias that might be introduced by gene length and in turn to help the investigators choose the appropriate gene set analysis algorithms in their GWAS datasets. Our method uses only summarised GWAS data with no need of permutation, thus, it is computationally efficient. A computer program is freely available for the users.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Genetic Predisposition to Disease*
  • Genome-Wide Association Study / methods*
  • Genomics
  • Humans
  • Polymorphism, Single Nucleotide*