lncRScan-SVM: A Tool for Predicting Long Non-Coding RNAs Using Support Vector Machine

PLoS One. 2015 Oct 5;10(10):e0139654. doi: 10.1371/journal.pone.0139654. eCollection 2015.

Abstract

Functional long non-coding RNAs (lncRNAs) have been bringing novel insight into biological study, however it is still not trivial to accurately distinguish the lncRNA transcripts (LNCTs) from the protein coding ones (PCTs). As various information and data about lncRNAs are preserved by previous studies, it is appealing to develop novel methods to identify the lncRNAs more accurately. Our method lncRScan-SVM aims at classifying PCTs and LNCTs using support vector machine (SVM). The gold-standard datasets for lncRScan-SVM model training, lncRNA prediction and method comparison were constructed according to the GENCODE gene annotations of human and mouse respectively. By integrating features derived from gene structure, transcript sequence, potential codon sequence and conservation, lncRScan-SVM outperforms other approaches, which is evaluated by several criteria such as sensitivity, specificity, accuracy, Matthews correlation coefficient (MCC) and area under curve (AUC). In addition, several known human lncRNA datasets were assessed using lncRScan-SVM. LncRScan-SVM is an efficient tool for predicting the lncRNAs, and it is quite useful for current lncRNA study.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Support Vector Machine*

Grants and funding

This work was supported by National Natural Science Foundation of China (61301220 to LS, 61201408 to HL, 61401370 to JM), http://www.nsfc.gov.cn/; China Fundamental Research Funds for the Central Universities (2014QNA84 to HL, 2014QNB47 to LZ), http://www.moe.edu.cn/; and Jiangsu Natural Science Foundation (BK20140403 to JM), http://www.jstd.gov.cn/. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.