Comparing five statistical methods of differential methylation identification using bisulfite sequencing data

Stat Appl Genet Mol Biol. 2016 Apr;15(2):173-91. doi: 10.1515/sagmb-2015-0078.


We are presenting a comprehensive comparative analysis of five differential methylation (DM) identification methods: methylKit, BSmooth, BiSeq, HMM-DM, and HMM-Fisher, which are developed for bisulfite sequencing (BS) data. We summarize the features of these methods from several analytical aspects and compare their performances using both simulated and real BS datasets. Our comparison results are summarized below. First, parameter settings may largely affect the accuracy of DM identification. Different from default settings, modified parameter settings yield higher sensitivity and/or lower false positive rates. Second, all five methods show more accurate results when identifying simulated DM regions that are long and have small within-group variation, but they have low concordance, probably due to the different approaches they have used for DM identification. Third, HMM-DM and HMM-Fisher yield relatively higher sensitivity and lower false positive rates than others, especially in DM regions with large variation. Finally, we have found that among the three methods that involve methylation estimation (methylKit, BSmooth, and BiSeq), BiSeq can best present raw methylation signals. Therefore, based on these results, we suggest that users select DM identification methods based on the characteristics of their data and the advantages of each method.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • CpG Islands / genetics
  • DNA Methylation / genetics*
  • Genome, Human
  • High-Throughput Nucleotide Sequencing / statistics & numerical data*
  • Humans
  • Sequence Analysis, DNA / statistics & numerical data*
  • Software