iDNA-Methyl: identifying DNA methylation sites via pseudo trinucleotide composition

Anal Biochem. 2015 Apr 1:474:69-77. doi: 10.1016/j.ab.2014.12.009. Epub 2015 Jan 14.

Abstract

Predominantly occurring on cytosine, DNA methylation is a process by which cells can modify their DNAs to change the expression of gene products. It plays very important roles in life development but also in forming nearly all types of cancer. Therefore, knowledge of DNA methylation sites is significant for both basic research and drug development. Given an uncharacterized DNA sequence containing many cytosine residues, which one can be methylated and which one cannot? With the avalanche of DNA sequences generated during the postgenomic age, it is highly desired to develop computational methods for accurately identifying the methylation sites in DNA. Using the trinucleotide composition, pseudo amino acid components, and a dataset-optimizing technique, we have developed a new predictor called "iDNA-Methyl" that has achieved remarkably higher success rates in identifying the DNA methylation sites than the existing predictors. A user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/iDNA-Methyl, where users can easily get their desired results. We anticipate that the web-server predictor will become a very useful high-throughput tool for basic research and drug development and that the novel approach and technique can also be used to investigate many other DNA-related problems and genome analysis.

Keywords: 3→1 Codon conversion; DNA methylation; Neighborhood cleaning rule; Pseudo amino acid components; Synthetic minority oversampling technique; Target–jackknife cross-validation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acids / metabolism
  • Base Sequence
  • Codon / genetics
  • Computational Biology / methods*
  • DNA Methylation / genetics*
  • Databases, Genetic
  • Humans
  • Internet
  • Nucleotides / metabolism*
  • ROC Curve
  • Reproducibility of Results
  • Software*
  • Support Vector Machine

Substances

  • Amino Acids
  • Codon
  • Nucleotides