Using machine learning and high-throughput RNA sequencing to classify the precursors of small non-coding RNAs

Methods. 2014 May 1;67(1):28-35. doi: 10.1016/j.ymeth.2013.10.002. Epub 2013 Oct 18.


Recent advances in high-throughput sequencing allow researchers to examine the transcriptome in more detail than ever before. Using a method known as high-throughput small RNA-sequencing, we can now profile the expression of small regulatory RNAs such as microRNAs and small interfering RNAs (siRNAs) with a great deal of sensitivity. However, there are many other types of small RNAs (<50nt) present in the cell, including fragments derived from snoRNAs (small nucleolar RNAs), snRNAs (small nuclear RNAs), scRNAs (small cytoplasmic RNAs), tRNAs (transfer RNAs), and transposon-derived RNAs. Here, we present a user's guide for CoRAL (Classification of RNAs by Analysis of Length), a computational method for discriminating between different classes of RNA using high-throughput small RNA-sequencing data. Not only can CoRAL distinguish between RNA classes with high accuracy, but it also uses features that are relevant to small RNA biogenesis pathways. By doing so, CoRAL can give biologists a glimpse into the characteristics of different RNA processing pathways and how these might differ between tissue types, biological conditions, or even different species. CoRAL is available at

Keywords: Machine learning; MicroRNAs; Non-coding RNAs; RNA-seq; Small RNAs; Small interfering RNAs.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Animals
  • Artificial Intelligence
  • Base Sequence
  • Decision Trees
  • Entropy
  • High-Throughput Nucleotide Sequencing*
  • Humans
  • Inverted Repeat Sequences
  • Molecular Sequence Data
  • Nucleic Acid Conformation
  • RNA Processing, Post-Transcriptional
  • RNA, Small Untranslated / classification*
  • RNA, Small Untranslated / genetics
  • Sequence Analysis, RNA*


  • RNA, Small Untranslated