Pipeline Olympics: continuable benchmarking of computational workflows for DNA methylation sequencing data against an experimental gold standard

Nucleic Acids Res. 2025 Oct 14;53(19):gkaf970. doi: 10.1093/nar/gkaf970.

Abstract

DNA methylation is a widely studied epigenetic mark and a powerful biomarker of cell type, age, environmental exposures, and disease. Whole-genome sequencing following selective conversion of unmethylated cytosines into thymines via bisulfite treatment or enzymatic methods remains the reference method for DNA methylation profiling genome-wide. While numerous software tools facilitate processing of DNA methylation sequencing reads, a comprehensive benchmarking study has been lacking. In this study, we systematically compared complete computational workflows for processing DNA methylation sequencing data using a dedicated benchmarking dataset generated with five whole-genome profiling protocols. As an evaluation reference, we employed accurate locus-specific measurements from our previous benchmark of targeted DNA methylation assays. Based on this experimental gold-standard assessment and multiple performance metrics, we identified workflows that consistently demonstrated superior performance and revealed major workflow development trends. To ensure the long-term utility of our benchmark, we implemented an interactive workflow execution and data presentation platform, adaptable to user-defined criteria and readily expandable to future software.

MeSH terms

  • Benchmarking
  • Computational Biology* / methods
  • DNA Methylation*
  • Epigenesis, Genetic
  • Epigenomics / methods
  • Humans
  • Sequence Analysis, DNA* / methods
  • Sequence Analysis, DNA* / standards
  • Software*
  • Whole Genome Sequencing / methods
  • Whole Genome Sequencing / standards
  • Workflow