Estimation of a significance threshold for epigenome-wide association studies

Genet Epidemiol. 2018 Feb;42(1):20-33. doi: 10.1002/gepi.22086. Epub 2017 Oct 15.

Abstract

Epigenome-wide association studies (EWAS) are designed to characterise population-level epigenetic differences across the genome and link them to disease. Most commonly, they assess DNA-methylation status at cytosine-guanine dinucleotide (CpG) sites, using platforms such as the Illumina 450k array that profile a subset of CpGs genome wide. An important challenge in the context of EWAS is determining a significance threshold for declaring a CpG site as differentially methylated, taking multiple testing into account. We used a permutation method to estimate a significance threshold specifically for the 450k array and a simulation extrapolation approach to estimate a genome-wide threshold. These methods were applied to five different EWAS datasets derived from a variety of populations and tissue types. We obtained an estimate of α=2.4×10-7 for the 450k array, and a genome-wide estimate of α=3.6×10-8. We further demonstrate the importance of these results by showing that previously recommended sample sizes for EWAS should be adjusted upwards, requiring samples between ∼10% and ∼20% larger in order to maintain type-1 errors at the desired level.

Keywords: CpG; DNA methylation; EWAS; FWER; GWAS; epigenetic epidemiology; permutation; resampling; simulation extrapolation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adolescent
  • Adult
  • Aged
  • Aged, 80 and over
  • Bipolar Disorder / genetics
  • Colorectal Neoplasms / genetics
  • CpG Islands / genetics*
  • DNA Methylation*
  • Datasets as Topic
  • Depression / genetics
  • Epigenesis, Genetic / genetics*
  • Genome, Human / genetics*
  • Genome-Wide Association Study / methods*
  • Humans
  • Infant
  • Middle Aged
  • Models, Genetic
  • Sample Size
  • Young Adult