A guide to genome-wide association analysis and post-analytic interrogation

Eric Reed; Sara Nunez; David Kulp; Jing Qian; Muredach P Reilly; Andrea S Foulkes

doi:10.1002/sim.6605

A guide to genome-wide association analysis and post-analytic interrogation

Stat Med. 2015 Dec 10;34(28):3769-92. doi: 10.1002/sim.6605. Epub 2015 Sep 6.

Authors

Eric Reed¹, Sara Nunez¹, David Kulp², Jing Qian³, Muredach P Reilly⁴, Andrea S Foulkes¹

Affiliations

¹ Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA, U.S.A.
² Department of Computer Science, University of Massachusetts, Amherst, MA, U.S.A.
³ Department of Biostatistics and Epidemiology, University of Massachusetts, Amherst, MA, U.S.A.
⁴ Department of Medicine, University of Pennsylvania, Philadelphia, PA, U.S.A.

Abstract

This tutorial is a learning resource that outlines the basic process and provides specific software tools for implementing a complete genome-wide association analysis. Approaches to post-analytic visualization and interrogation of potentially novel findings are also presented. Applications are illustrated using the free and open-source R statistical computing and graphics software environment, Bioconductor software for bioinformatics and the UCSC Genome Browser. Complete genome-wide association data on 1401 individuals across 861,473 typed single nucleotide polymorphisms from the PennCATH study of coronary artery disease are used for illustration. All data and code, as well as additional instructional resources, are publicly available through the Open Resources in Statistical Genomics project: http://www.stat-gen.org.

Keywords: Bioconductor; Hardy-Weinberg equilibrium (HWE); IBD; Manhattan plot; Q-Q plot; R code; SNP filtering; UCSC Genome Browser; ancestry; call rate; genome-wide association (GWA) study; heatmap; heterozygosity; imputation; lambda statistic; minor allele frequency (MAF); parallel processing; principal component analysis (PCA); regional association plot; relatedness; sample filtering; statistical genomics; substructure; tutorial.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Computational Biology*
Databases, Genetic
Genome-Wide Association Study* / statistics & numerical data
Humans
Software

Abstract

Publication types

MeSH terms

Grants and funding