Integrating prior knowledge in multiple testing under dependence with applications to detecting differential DNA methylation

Biometrics. 2012 Sep;68(3):774-83. doi: 10.1111/j.1541-0420.2011.01730.x. Epub 2012 Jan 19.

Abstract

DNA methylation has emerged as an important hallmark of epigenetics. Numerous platforms including tiling arrays and next generation sequencing, and experimental protocols are available for profiling DNA methylation. Similar to other tiling array data, DNA methylation data shares the characteristics of inherent correlation structure among nearby probes. However, unlike gene expression or protein DNA binding data, the varying CpG density which gives rise to CpG island, shore and shelf definition provides exogenous information in detecting differential methylation. This article aims to introduce a robust testing and probe ranking procedure based on a nonhomogeneous hidden Markov model that incorporates the above-mentioned features for detecting differential methylation. We revisit the seminal work of Sun and Cai (2009, Journal of the Royal Statistical Society: Series B (Statistical Methodology)71, 393-424) and propose modeling the nonnull using a nonparametric symmetric distribution in two-sided hypothesis testing. We show that this model improves probe ranking and is robust to model misspecification based on extensive simulation studies. We further illustrate that our proposed framework achieves good operating characteristics as compared to commonly used methods in real DNA methylation data that aims to detect differential methylation sites.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biometry / methods*
  • CpG Islands
  • DNA Methylation*
  • Databases, Nucleic Acid / statistics & numerical data
  • Epigenesis, Genetic
  • Humans
  • Markov Chains
  • Models, Genetic
  • Models, Statistical*
  • Mutation
  • Oligonucleotide Array Sequence Analysis / statistics & numerical data
  • Probability