i4mC-ROSE, a bioinformatics tool for the identification of DNA N4-methylcytosine sites in the Rosaceae genome

Int J Biol Macromol. 2020 Aug 15:157:752-758. doi: 10.1016/j.ijbiomac.2019.12.009. Epub 2019 Dec 2.

Abstract

One of the most important epigenetic modifications is N4-methylcytosine, which regulates many biological processes including DNA replication and chromosome stability. Identification of N4-methylcytosine sites is pivotal to understand specific biological functions. Herein, we developed the first bioinformatics tool called i4mC-ROSE for identifying N4-methylcytosine sites in the genomes of Fragaria vesca and Rosa chinensis in the Rosaceae, which utilizes a random forest classifier with six encoding methods that cover various aspects of DNA sequence information. The i4mC-ROSE predictor achieves area under the curve scores of 0.883 and 0.889 for the two genomes during cross-validation. Moreover, the i4mC-ROSE outperforms other classifiers tested in this study when objectively evaluated on the independent datasets. The proposed i4mC-ROSE tool can serve users' demand for the prediction of 4mC sites in the Rosaceae genome. The i4mC-ROSE predictor and utilized datasets are publicly accessible at http://kurata14.bio.kyutech.ac.jp/i4mC-ROSE/.

Keywords: DNA methylation; Linear regression; Machine learning; N4-methylcytosine site; Sequence encoding.

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Cytosine* / metabolism
  • DNA Methylation*
  • Databases, Genetic
  • Epigenesis, Genetic*
  • Epigenomics / methods*
  • Genome, Plant*
  • Machine Learning
  • ROC Curve
  • Reproducibility of Results
  • Rosaceae / genetics*
  • Web Browser

Substances

  • Cytosine