LncRNApred: Classification of Long Non-Coding RNAs and Protein-Coding Transcripts by the Ensemble Algorithm with a New Hybrid Feature

PLoS One. 2016 May 26;11(5):e0154567. doi: 10.1371/journal.pone.0154567. eCollection 2016.

Abstract

As a novel class of noncoding RNAs, long noncoding RNAs (lncRNAs) have been verified to be associated with various diseases. As large scale transcripts are generated every year, it is significant to accurately and quickly identify lncRNAs from thousands of assembled transcripts. To accurately discover new lncRNAs, we develop a classification tool of random forest (RF) named LncRNApred based on a new hybrid feature. This hybrid feature set includes three new proposed features, which are MaxORF, RMaxORF and SNR. LncRNApred is effective for classifying lncRNAs and protein coding transcripts accurately and quickly. Moreover,our RF model only requests the training using data on human coding and non-coding transcripts. Other species can also be predicted by using LncRNApred. The result shows that our method is more effective compared with the Coding Potential Calculate (CPC). The web server of LncRNApred is available for free at http://mm20132014.wicp.net:57203/LncRNApred/home.jsp.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • RNA, Long Noncoding* / classification
  • RNA, Long Noncoding* / genetics
  • RNA, Messenger* / classification
  • RNA, Messenger* / genetics
  • Sequence Analysis, RNA / methods*
  • Software*

Substances

  • RNA, Long Noncoding
  • RNA, Messenger

Grants and funding

This work is supported by the National Natural Science Foundation of China (11571173, 11401311, 31301229) and the Natural Science Foundation of Jiangsu Province (BK20141358). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.