Using HaMMLET for Bayesian Segmentation of WGS Read-Depth Data

Methods Mol Biol. 2018;1833:83-93. doi: 10.1007/978-1-4939-8666-8_6.

Abstract

CNV detection requires a high-quality segmentation of genomic data. In many WGS experiments, sample and control are sequenced together in a multiplexed fashion using DNA barcoding for economic reasons. Using the differential read depth of these two conditions cancels out systematic additive errors. Due to this detrending, the resulting data is appropriate for inference using a hidden Markov model (HMM), arguably one of the principal models for labeled segmentation. However, while the usual frequentist approaches such as Baum-Welch are problematic for several reasons, they are often preferred to Bayesian HMM inference, which normally requires prohibitively long running times and exceeds a typical user's computational resources on a genome scale data. HaMMLET solves this problem using a dynamic wavelet compression scheme, which makes Bayesian segmentation of WGS data feasible on standard consumer hardware.

Keywords: Bayesian inference; CNV; HaMMLET; Hidden Markov Model; Segmentation; Whole genome sequencing.

MeSH terms

  • DNA Barcoding, Taxonomic / methods*
  • High-Throughput Nucleotide Sequencing*
  • Markov Chains
  • Sequence Analysis, DNA / methods*