A simple and accurate method to determine genomewide significance for association tests in sequencing studies

Genet Epidemiol. 2019 Jun;43(4):365-372. doi: 10.1002/gepi.22183. Epub 2019 Jan 8.

Abstract

Whole-exome sequencing (WES) and whole-genome sequencing (WGS) studies are underway to investigate the impact of genetic variants on complex diseases and traits. It is customary to perform single-variant association tests for common variants and region-based association tests for rare variants. The latter may target variants with similar or opposite effects, interrogate variants with different frequencies or different functional annotations, and examine a variety of regions. The large number of tests that are performed necessitates adjustment for multiple testing. The conventional Bonferroni correction is overly conservative as the test statistics are correlated. To address this challenge, we propose a simple and accurate method based on parametric bootstrap to assess genomewide significance. We show that the correlations of the test statistics are determined primarily by the genotypes, such that the same significance threshold can be used in different studies that share a common sequencing platform. We demonstrate the usefulness of the proposed method with WES data from the National Heart, Lung, and Blood Institute Exome Sequencing Project and WGS data from the 1000 Genomes Project. We recommend the p value of 5×10-9 as the genomewide significance threshold for testing all common and low-frequency variants (MAFs 0.1%) in the human genome.

Keywords: SNPs; gene-based association tests; parametric bootstrap; sliding windows; whole-exome sequencing studies; whole-genome sequencing studies.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Exome
  • Genetic Association Studies* / methods
  • Genetic Association Studies* / standards
  • Genetic Association Studies* / statistics & numerical data
  • Genome, Human / genetics*
  • Genotype
  • High-Throughput Nucleotide Sequencing / methods
  • High-Throughput Nucleotide Sequencing / standards
  • High-Throughput Nucleotide Sequencing / statistics & numerical data
  • Humans
  • Models, Theoretical
  • Phenotype
  • Polymorphism, Single Nucleotide
  • Practice Guidelines as Topic
  • Reproducibility of Results
  • Whole Genome Sequencing* / methods
  • Whole Genome Sequencing* / statistics & numerical data