HMMRATAC: a Hidden Markov ModeleR for ATAC-seq

Nucleic Acids Res. 2019 Sep 19;47(16):e91. doi: 10.1093/nar/gkz533.

Abstract

ATAC-seq has been widely adopted to identify accessible chromatin regions across the genome. However, current data analysis still utilizes approaches initially designed for ChIP-seq or DNase-seq, without considering the transposase digested DNA fragments that contain additional nucleosome positioning information. We present the first dedicated ATAC-seq analysis tool, a semi-supervised machine learning approach named HMMRATAC. HMMRATAC splits a single ATAC-seq dataset into nucleosome-free and nucleosome-enriched signals, learns the unique chromatin structure around accessible regions, and then predicts accessible regions across the entire genome. We show that HMMRATAC outperforms the popular peak-calling algorithms on published human ATAC-seq datasets. We find that single-end sequenced or size-selected ATAC-seq datasets result in a loss of sensitivity compared to paired-end datasets without size-selection.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • DNA / genetics*
  • Datasets as Topic
  • Genome, Human
  • High-Throughput Nucleotide Sequencing
  • Histones / genetics
  • Histones / metabolism
  • Humans
  • Markov Chains
  • Nucleosomes / chemistry*
  • Sequence Analysis, DNA
  • Software*
  • Supervised Machine Learning*
  • Transposases / genetics
  • Transposases / metabolism

Substances

  • Histones
  • Nucleosomes
  • DNA
  • Transposases