This tutorial is a learning resource that outlines the basic process and provides specific software tools for implementing a complete genome-wide association analysis. Approaches to post-analytic visualization and interrogation of potentially novel findings are also presented. Applications are illustrated using the free and open-source R statistical computing and graphics software environment, Bioconductor software for bioinformatics and the UCSC Genome Browser. Complete genome-wide association data on 1401 individuals across 861,473 typed single nucleotide polymorphisms from the PennCATH study of coronary artery disease are used for illustration. All data and code, as well as additional instructional resources, are publicly available through the Open Resources in Statistical Genomics project: http://www.stat-gen.org.
Keywords: Bioconductor; Hardy-Weinberg equilibrium (HWE); IBD; Manhattan plot; Q-Q plot; R code; SNP filtering; UCSC Genome Browser; ancestry; call rate; genome-wide association (GWA) study; heatmap; heterozygosity; imputation; lambda statistic; minor allele frequency (MAF); parallel processing; principal component analysis (PCA); regional association plot; relatedness; sample filtering; statistical genomics; substructure; tutorial.
© 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.