MeSiC: A Model-Based Method for Estimating 5 mC Levels at Single-CpG Resolution from MeDIP-seq

Sci Rep. 2015 Oct 1:5:14699. doi: 10.1038/srep14699.

Abstract

As the fifth base in mammalian genome, 5-methylcytosine (5 mC) is essential for many biological processes including normal development and disease. Methylated DNA immunoprecipitation sequencing (MeDIP-seq), which uses anti-5 mC antibodies to enrich for methylated fraction of the genome, is widely used to investigate methylome at a resolution of 100-500 bp. Considering the CpG density-dependent bias and limited resolution of MeDIP-seq, we developed a Random Forest Regression (RFR) model method, MeSiC, to estimate DNA methylation levels at single-base resolution. MeSiC integrated MeDIP-seq signals of CpG sites and their surrounding neighbors as well as genomic features to construct genomic element-dependent RFR models. In the H1 cell line, a high correlation was observed between MeSiC predictions and actual 5 mC levels. Meanwhile, MeSiC enabled to calibrate CpG density-dependent bias of MeDIP-seq signals. Importantly, we found that MeSiC models constructed in the H1 cell line could be used to accurately predict DNA methylation levels for other cell types. Comparisons with methylCRF and MEDIPS showed that MeSiC achieved comparable and even better performance. These demonstrate that MeSiC can provide accurate estimations of 5 mC levels at single-CpG resolution using MeDIP-seq data alone.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • 5-Methylcytosine*
  • Animals
  • Computational Biology / methods*
  • CpG Islands*
  • DNA Methylation*
  • Genomics / methods
  • High-Throughput Nucleotide Sequencing*
  • Humans
  • Models, Statistical*
  • Reproducibility of Results

Substances

  • 5-Methylcytosine