Several genome-wide association studies (GWAS) have been published on various complex diseases. Although, new loci are found to be associated with these diseases, still only very little of the genetic risk for these diseases can be explained. As GWAS are still underpowered to find small main effects, and gene-gene interactions are likely to play a role, the data might currently not be analyzed to its full potential. In this study, we evaluated alternative methods to study GWAS data. Instead of focusing on the single nucleotide polymorphisms (SNPs) with the highest statistical significance, we took advantage of prior biological information and tried to detect overrepresented pathways in the GWAS data. We evaluated whether pathway classification analysis can help prioritize the biological pathways most likely to be involved in the disease etiology. In this study, we present the various benefits and limitations of pathway-classification tools in analyzing GWAS data. We show multiple differences in outcome between pathway tools analyzing the same dataset. Furthermore, analyzing randomly selected SNPs always results in significantly overrepresented pathways, large pathways have a higher chance of becoming statistically significant and the bioinformatics tools used in this study are biased toward detecting well-defined pathways. As an example, we analyzed data from two GWAS on type 2 diabetes (T2D): the Diabetes Genetics Initiative (DGI) and the Wellcome Trust Case Control Consortium (WTCCC). Occasionally the results from the DGI and the WTCCC GWAS showed concordance in overrepresented pathways, but discordance in the corresponding genes. Thus, incorporating gene networks and pathway classification tools into the analysis can point toward significantly overrepresented molecular pathways, which cannot be picked up using traditional single-locus analyses. However, the limitations discussed in this study, need to be addressed before these methods can be widely used.
2009 Wiley-Liss, Inc.