Identifying 5-methylcytosine sites in RNA sequence using composite encoding feature into Chou's PseKNC

J Theor Biol. 2018 Sep 7:452:1-9. doi: 10.1016/j.jtbi.2018.04.037. Epub 2018 May 1.

Abstract

This study examines accurate and efficient computational method for identification of 5-methylcytosine sites in RNA modification. The occurrence of 5-methylcytosine (m5C) plays a vital role in a number of biological processes. For better comprehension of the biological functions and mechanism it is necessary to recognize m5C sites in RNA precisely. The laboratory techniques and procedures are available to identify m5C sites in RNA, but these procedures require a lot of time and resources. This study develops a new computational method for extracting the features of RNA sequence. In this method, first the RNA sequence is encoded via composite feature vector, then, for the selection of discriminate features, the minimum-redundancy-maximum-relevance algorithm was used. Secondly, the classification method used has been based on a support vector machine by using jackknife cross validation test. The suggested method efficiently identifies m5C sites from non- m5C sites and the outcome of the suggested algorithm is 93.33% with sensitivity of 90.0 and specificity of 96.66 on bench mark datasets. The result exhibits that proposed algorithm shown significant identification performance compared to the existing computational techniques. This study extends the knowledge about the occurrence sites of RNA modification which paves the way for better comprehension of the biological uses and mechanism.

Keywords: Composite features; RNA modification; Support vector machine; Tetra nucleotide.

MeSH terms

  • 5-Methylcytosine / chemistry
  • 5-Methylcytosine / metabolism*
  • Algorithms*
  • Amino Acid Sequence
  • Amino Acids / genetics
  • Base Sequence
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Models, Theoretical
  • Proteins / chemistry
  • Proteins / genetics
  • Proteins / metabolism
  • RNA / chemistry
  • RNA / genetics
  • RNA / metabolism*
  • Support Vector Machine*

Substances

  • Amino Acids
  • Proteins
  • RNA
  • 5-Methylcytosine