UClncR: Ultrafast and comprehensive long non-coding RNA detection from RNA-seq

Sci Rep. 2017 Oct 27;7(1):14196. doi: 10.1038/s41598-017-14595-3.

Abstract

Long non-coding RNA (lncRNA) is a large class of gene transcripts with regulatory functions discovered in recent years. Many more are expected to be revealed with accumulation of RNA-seq data from diverse types of normal and diseased tissues. However, discovering novel lncRNAs and accurately quantifying known lncRNAs is not trivial from massive RNA-seq data. Herein we describe UClncR, an Ultrafast and Comprehensive lncRNA detection pipeline to tackle the challenge. UClncR takes standard RNA-seq alignment file, performs transcript assembly, predicts lncRNA candidates, quantifies and annotates both known and novel lncRNA candidates, and generates a convenient report for downstream analysis. The pipeline accommodates both un-stranded and stranded RNA-seq so that lncRNAs overlapping with other genes can be predicted and quantified. UClncR is fully parallelized in a cluster environment yet allows users to run samples sequentially without a cluster. The pipeline can process a typical RNA-seq sample in a matter of minutes and complete hundreds of samples in a matter of hours. Analysis of predicted lncRNAs from two test datasets demonstrated UClncR's accuracy and their relevance to sample clinical phenotypes. UClncR would facilitate researchers' novel lncRNA discovery significantly and is publically available at http://bioinformaticstools.mayo.edu/research/UClncR .

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adenocarcinoma of Lung / genetics
  • Computational Biology
  • Humans
  • RNA, Long Noncoding / genetics*
  • Sequence Analysis, RNA / methods*
  • Time Factors

Substances

  • RNA, Long Noncoding