A guide to genome-wide association analysis and post-analytic interrogation

Stat Med. 2015 Dec 10;34(28):3769-92. doi: 10.1002/sim.6605. Epub 2015 Sep 6.

Abstract

This tutorial is a learning resource that outlines the basic process and provides specific software tools for implementing a complete genome-wide association analysis. Approaches to post-analytic visualization and interrogation of potentially novel findings are also presented. Applications are illustrated using the free and open-source R statistical computing and graphics software environment, Bioconductor software for bioinformatics and the UCSC Genome Browser. Complete genome-wide association data on 1401 individuals across 861,473 typed single nucleotide polymorphisms from the PennCATH study of coronary artery disease are used for illustration. All data and code, as well as additional instructional resources, are publicly available through the Open Resources in Statistical Genomics project: http://www.stat-gen.org.

Keywords: Bioconductor; Hardy-Weinberg equilibrium (HWE); IBD; Manhattan plot; Q-Q plot; R code; SNP filtering; UCSC Genome Browser; ancestry; call rate; genome-wide association (GWA) study; heatmap; heterozygosity; imputation; lambda statistic; minor allele frequency (MAF); parallel processing; principal component analysis (PCA); regional association plot; relatedness; sample filtering; statistical genomics; substructure; tutorial.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Computational Biology*
  • Databases, Genetic
  • Genome-Wide Association Study* / statistics & numerical data
  • Humans
  • Software