Improved detection of epigenomic marks with mixed-effects hidden Markov models

Biometrics. 2019 Dec;75(4):1401-1413. doi: 10.1111/biom.13083. Epub 2019 Oct 17.

Abstract

Chromatin immunoprecipitation followed by next-generation sequencing (ChIP-seq) is a technique to detect genomic regions containing protein-DNA interaction, such as transcription factor binding sites or regions containing histone modifications. One goal of the analysis of ChIP-seq experiments is to identify genomic loci enriched for sequencing reads pertaining to DNA bound to the factor of interest. The accurate identification of such regions aids in the understanding of epigenomic marks and gene regulatory mechanisms. Given the reduction of massively parallel sequencing costs, methods to detect consensus regions of enrichment across multiple samples are of interest. Here, we present a statistical model to detect broad consensus regions of enrichment from ChIP-seq technical or biological replicates through a class of zero-inflated mixed-effects hidden Markov models. We show that the proposed model outperforms existing methods for consensus peak calling in common epigenomic marks by accounting for the excess zeros and sample-specific biases. We apply our method to data from the Encyclopedia of DNA Elements and Roadmap Epigenomics projects and also from an extensive simulation study.

Keywords: ChIP-Seq; hidden Markov model; mixed model.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Binding Sites*
  • Computer Simulation
  • DNA / metabolism
  • DNA-Binding Proteins / analysis
  • Epigenomics / methods*
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Markov Chains*
  • Sequence Analysis, DNA*

Substances

  • DNA-Binding Proteins
  • DNA