Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Apr;42(6):e45.
doi: 10.1093/nar/gkt1373. Epub 2014 Jan 13.

Bisulfighter: Accurate Detection of Methylated Cytosines and Differentially Methylated Regions

Affiliations
Free PMC article

Bisulfighter: Accurate Detection of Methylated Cytosines and Differentially Methylated Regions

Yutaka Saito et al. Nucleic Acids Res. .
Free PMC article

Abstract

Analysis of bisulfite sequencing data usually requires two tasks: to call methylated cytosines (mCs) in a sample, and to detect differentially methylated regions (DMRs) between paired samples. Although numerous tools have been proposed for mC calling, methods for DMR detection have been largely limited. Here, we present Bisulfighter, a new software package for detecting mCs and DMRs from bisulfite sequencing data. Bisulfighter combines the LAST alignment tool for mC calling, and a novel framework for DMR detection based on hidden Markov models (HMMs). Unlike previous attempts that depend on empirical parameters, Bisulfighter can use the expectation-maximization algorithm for HMMs to adjust parameters for each data set. We conduct extensive experiments in which accuracy of mC calling and DMR detection is evaluated on simulated data with various mC contexts, read qualities, sequencing depths and DMR lengths, as well as on real data from a wide range of biological processes. We demonstrate that Bisulfighter consistently achieves better accuracy than other published tools, providing greater sensitivity for mCs with fewer false positives, more precise estimates of mC levels, more exact locations of DMRs and better agreement of DMRs with gene expression and DNase I hypersensitivity. The source code is available at http://epigenome.cbrc.jp/bisulfighter.

Figures

Figure 1.
Figure 1.
Overview of Bisulfighter. (a) mC calling. Bisulfite-converted reads are aligned to a reference genome, and the mC level is estimated as a ratio of C–C matches. A major feature is the utilization of alignment probability for filtering out unreliable alignments, and for weighting mC level estimates. (b) DMR detection. Neighbor cytosines differentially methylated between paired samples are grouped as a DMR (UP or DOWN). A novel HMM-based framework enables automatic learning of chaining criteria, and detection of DMRs using likelihood ratio scores. Colors in the state transition track correspond to those in the state transition diagram at the top. NoCh: no change of methylation between paired samples.
Figure 2.
Figure 2.
Benchmark for mC calling. (a and b) Binary classification of mCs. CpGs were called as mCs if nonzero mC levels were estimated. (a) Trade-off between the true-positive rate and the number of false positives for varying sequencing depths. (b) The true-positive rate and the FDR at the limited sequencing depth of 3M reads (shown as dots in a). For Bisulfighter, 3M reads were equivalent to the mean coverage of 2.4 among those CpGs with at least one aligned read. True-positive rates plateaued around 0.9 due to low quality of simulated reads. (c) Estimation of mC levels. Distributions of errors between estimated and true mC levels are shown as box plots (top; 25th–75th percentile), and histograms for BatMeth, RMAP and Bisulfighter (bottom). The complete results including non-CpG contexts, high-quality reads and higher sequencing depths are found in Supplementary Figures S5–S7.
Figure 3.
Figure 3.
Benchmark for DMR detection. (a and b) Experiments on simulated data. (a) For various DMR lengths and the fixed sequencing depth of 50M reads, true positives with 50 (left), 90 (center) or 99% (right) reciprocal overlap are shown. Ind: simulation with independence of neighbor positions. (b) For varying sequencing depths and the fixed DMR length of 500 bp, true positives with 50% reciprocal overlap are shown. (c and d) Experiments on real data. (c) Agreement between detected DMRs and gene expression. Carcino: Carcinogenesis data set. Adipo: Adipogenesis data set. See the ‘Experiments on real data—gene expression’ section for details. (d) Agreement between detected DMRs and DNase I hypersensitivity. Hemato: Hematopoiesis data set. Fibro: Fibroblast development data set. See the ‘Experiments on real data—DNase I hypersensitivity’ section for details. dual, naive: results for the corresponding HMM architectures in Bisulfighter.

Similar articles

See all similar articles

Cited by 20 articles

See all "Cited by" articles

References

    1. Jones PA. Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat. Rev. Genet. 2012;13:484–492. - PubMed
    1. Laird PW. Principles and challenges of genomewide DNA methylation analysis. Nat. Rev. Genet. 2010;11:191–203. - PubMed
    1. Bock C. Analysing and interpreting DNA methylation data. Nat. Rev. Genet. 2012;13:705–719. - PubMed
    1. Hansen KD, Langmead B, Irizarry RA. BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions. Genome Biol. 2012;13:R83. - PMC - PubMed
    1. Lister R, Pelizzola M, Kida YS, Hawkins RD, Nery JR, Hon G, Antosiewicz-Bourget J, O’Malley R, Castanon R, Klugman S, et al. Hotspots of aberrant epigenomic reprogramming in human induced pluripotent stem cells. Nature. 2011;471:68–73. - PMC - PubMed

Publication types

Feedback