Classification of riboswitch sequences using k-mer frequencies

Biosystems. 2018 Dec:174:63-76. doi: 10.1016/j.biosystems.2018.09.001. Epub 2018 Sep 8.

Abstract

Riboswitches are non-coding RNAs that regulate gene expression by altering the structural conformation of mRNA transcripts. Their regulation mechanism might be exploited for interesting biomedical applications such as drug targets and biosensors. A major challenge consists in accurately identifying metabolite-binding RNA switches which are structurally complex and diverse. In this regard, we investigated the classification of 16 riboswitch families using supervised learning algorithms trained solely with sequence-based features. We generated a reduced feature set and proposed a visual representation to explore its components. We induced Support Vector Machine, Random Forest, Naive Bayes, J48, and HyperPipes classifiers with our proposed feature set and tested their performance over independent data. Our best multi-class classifier achieved F-measure values of 0.996 and 0.966 in the training and test phases, respectively, outperforming those of a previous approach. When compared against BLAST, our best classifiers yielded competitive results. This work shows that the classifiers trained with our sequence-based feature set accurately discriminate riboswitches.

Keywords: Non-coding RNA; Riboswitch; Supervised learning; k-mer frequency.

MeSH terms

  • Algorithms*
  • Humans
  • Models, Biological
  • RNA / classification*
  • RNA / genetics*
  • Riboswitch*
  • Supervised Machine Learning

Substances

  • Riboswitch
  • RNA