Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Apr 1;31(7):1007-15.
doi: 10.1093/bioinformatics/btu783. Epub 2014 Nov 26.

MGAS: a powerful tool for multivariate gene-based genome-wide association analysis

Affiliations
Free PMC article

MGAS: a powerful tool for multivariate gene-based genome-wide association analysis

Sophie Van der Sluis et al. Bioinformatics. .
Free PMC article

Abstract

Motivation: Standard genome-wide association studies, testing the association between one phenotype and a large number of single nucleotide polymorphisms (SNPs), are limited in two ways: (i) traits are often multivariate, and analysis of composite scores entails loss in statistical power and (ii) gene-based analyses may be preferred, e.g. to decrease the multiple testing problem.

Results: Here we present a new method, multivariate gene-based association test by extended Simes procedure (MGAS), that allows gene-based testing of multivariate phenotypes in unrelated individuals. Through extensive simulation, we show that under most trait-generating genotype-phenotype models MGAS has superior statistical power to detect associated genes compared with gene-based analyses of univariate phenotypic composite scores (i.e. GATES, multiple regression), and multivariate analysis of variance (MANOVA). Re-analysis of metabolic data revealed 32 False Discovery Rate controlled genome-wide significant genes, and 12 regions harboring multiple genes; of these 44 regions, 30 were not reported in the original analysis.

Conclusion: MGAS allows researchers to conduct their multivariate gene-based analyses efficiently, and without the loss of power that is often associated with an incorrectly specified genotype-phenotype models.

Availability and implementation: MGAS is freely available in KGG v3.0 (http://statgenpro.psychiatry.hku.hk/limx/kgg/download.php). Access to the metabolic dataset can be requested at dbGaP (https://dbgap.ncbi.nlm.nih.gov/). The R-simulation code is available from http://ctglab.nl/people/sophie_van_der_sluis.

Supplementary information: Supplementary data are available at Bioinformatics online.

Figures

Fig. 1.
Fig. 1.
Schematic representation of six trait-generating genotype–phenotype models. (A) 1-factor model with all SNPS within a gene affecting the latent factor, and through the latent factor all underlying phenotypes. (B) 1-factor model with all SNPs within a gene affecting one underlying phenotype directly. (C) 4-factor model with all SNPS within a gene affecting only one of the four latent factors, and all phenotypes underlying that factor. (D) 4-factor model with all SNPS within a gene affecting one underlying phenotype directly. (E) Network model in which all phenotypes are equally and bidirectionally related, yielding a phenotypic variance–covariance matrix mimicking that of a 1-factor model. All SNPs within a gene affect one phenotype directly and all related phenotypes indirectly. (F) Network model distinguishing four clusters of phenotypes; all phenotypes are bidirectionally related, with relations being stronger within, compared with between, clusters, yielding a phenotypic variance–covariance matrix mimicking that of a 4-factor model. All SNPs within a gene affect one phenotype directly and all related phenotypes indirectly. See Supplementary Material for specific simulation settings
Fig. 2.
Fig. 2.
Radial power plots for six trait-generating genotype–phenotype models (A–F) and various genetic situations (I–XI). (A) 1-factor model with gene effects on the latent factor. (B) 1-factor model with gene affecting only one phenotype directly. (C) 4-factor model with gene effect on only one of four latent factors. (D) 4-factor model with gene affecting only one phenotype directly. (E) Network model mimicking 1-factor model with gene affecting one phenotype directly and all related phenotypes indirectly. (F) Network model mimicking 4-factor model with gene affecting one phenotype directly and all related phenotypes indirectly. I–III represent results for the large gene (60 SNPs): (I): eight LD blocks, one DSL; (II): eight LD blocks, eight DSL; (III): eight LD blocks, eight DSL of opposite effect. IV–XI represent result for a small gene (10 SNPs): (IV): one LD block, one DSL; (V): one LD block, two DSL; (VI): one LD block, four DSL; (VII): two LD blocks, one DSL; (VIII): two LD blocks, two DSL; (IX): two LD block, two DSL of opposite effect; (X): one LD block, one DSL conveying opposite effects on different phenotypes (that in network models resided in the same cluster); (XI): one LD block, one DSL conveying opposite effects on phenotypes in different clusters. Specific power results and Type I error rates for all scenarios are in Supplementary Tables S2–S7. Because MGAS has less power to detect larger genes (see Supplementary Material), small genes and large genes were simulated to explain 0.5 and 1% of the variance, respectively. Power results of the five methods are thus not directly comparable between scenarios (see Supplementary Material)

Similar articles

See all similar articles

Cited by 15 articles

See all "Cited by" articles

References

    1. Aulchenko Y.S., et al. . (2007) GenABEL: an R library for genome-wide association analysis. Bioinformatics, 23, 1294–1296. - PubMed
    1. Aulchenko Y.S., et al. . (2010) ProbABEL package for genome-wide association analysis of imputed data. BMC Bioinformatics, 11, 134. - PMC - PubMed
    1. Basu S., et al. . (2013) Rapid gene-based genome-wide association test with multivariate traits. Hum. Hered., 71, 67–82. - PMC - PubMed
    1. Benjamini Y., Hochberg Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B, 57, 289–300.
    1. Borsboom D., et al. . (2011) The small world of psychopathology. Plos One, 6, e27407. - PMC - PubMed

Publication types

Feedback