Computationally Tractable Multivariate HMM in Genome-Wide Mapping Studies

Methods Mol Biol. 2017:1552:135-148. doi: 10.1007/978-1-4939-6753-7_10.


Hidden Markov model (HMM) is widely used for modeling spatially correlated genomic data (series data). In genomics, datasets of this kind are generated from genome-wide mapping studies through high-throughput methods such as chromatin immunoprecipitation coupled with massively parallel sequencing (ChIP-seq). When multiple regulatory protein binding sites or related epigenetic modifications are mapped simultaneously, the correlation between data series can be incorporated into the latent variable inference in a multivariate form of HMM, potentially increasing the statistical power of signal detection. In this chapter, we review the challenges of multivariate HMMs and propose a computationally tractable method called sparsely correlated HMMs (scHMM). We illustrate the method and the scHMM package using an example mouse ChIP-seq dataset.

Keywords: Genome-wide mapping study; Hidden Markov model.

MeSH terms

  • Algorithms
  • Animals
  • Binding Sites
  • Chromatin Immunoprecipitation / methods*
  • Chromosome Mapping / methods*
  • Computational Biology / methods*
  • Epigenesis, Genetic
  • Genome*
  • Genomics / methods*
  • Markov Chains*
  • Mice
  • Regulatory Sequences, Nucleic Acid
  • Transcription Factors / metabolism


  • Transcription Factors