Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 May 5;45(8):e57.
doi: 10.1093/nar/gkw1306.

FEELnc: A Tool for Long Non-Coding RNA Annotation and Its Application to the Dog Transcriptome

Affiliations
Free PMC article

FEELnc: A Tool for Long Non-Coding RNA Annotation and Its Application to the Dog Transcriptome

Valentin Wucher et al. Nucleic Acids Res. .
Free PMC article

Abstract

Whole transcriptome sequencing (RNA-seq) has become a standard for cataloguing and monitoring RNA populations. One of the main bottlenecks, however, is to correctly identify the different classes of RNAs among the plethora of reconstructed transcripts, particularly those that will be translated (mRNAs) from the class of long non-coding RNAs (lncRNAs). Here, we present FEELnc (FlExible Extraction of LncRNAs), an alignment-free program that accurately annotates lncRNAs based on a Random Forest model trained with general features such as multi k-mer frequencies and relaxed open reading frames. Benchmarking versus five state-of-the-art tools shows that FEELnc achieves similar or better classification performance on GENCODE and NONCODE data sets. The program also provides specific modules that enable the user to fine-tune classification accuracy, to formalize the annotation of lncRNA classes and to identify lncRNAs even in the absence of a training set of non-coding RNAs. We used FEELnc on a real data set comprising 20 canine RNA-seq samples produced by the European LUPA consortium to substantially expand the canine genome annotation to include 10 374 novel lncRNAs and 58 640 mRNA transcripts. FEELnc moves beyond conventional coding potential classifiers by providing a standardized and complete solution for annotating lncRNAs and is freely available at https://github.com/tderrien/FEELnc.

Figures

Figure 1.
Figure 1.
FEELnccodpot and FEELncclassifier description. (A) Two graph ROC curves for automatic detection of optimized CPS threshold and user specificity threshold, the latter defining two conservative sets of lncRNAs and mRNAs and a class of transcripts with ambiguous biotypes termed TUCp (Transcripts of Unknown Coding potential). B) Sub classification of intergenic and genic lncRNA/transcripts interactions by the FEELncclassifier module.
Figure 2.
Figure 2.
General overview of the FEELnc pipeline. The FEELnc filter module (FEELncfilter) identifies newly assembled RNA-seq transcripts and removes non-lncRNA transcripts. The FEELnc coding potential module (FEELnccodpot) computes a coding potential score (CPS) and automatically defines the optimal CPS score cut-off to discriminate lncRNAs versus mRNAs (and eventually TUCPs). The FEELnc classifier module (FEELncclassifier) annotates lncRNA classes based on RNA partners from the reference annotation.
Figure 3.
Figure 3.
FEELnc performance against coding potential tools and with shuffle, intergenic and cross-species approaches. (A) ROC curve analysis of FEELnc versus coding potential tools based on GENCODE human data set (HT). (B) Empirical cumulative distribution of FEELnccodpot feature scores with the true set of human lncRNAs (‘lncRNA’) in comparison with the ‘shuffle’ and ‘intergenic’ methods. (C) FEELnccodpot MCC values tested on human HT set and trained using human mRNAs and species-specific NONCODE lncRNAs (cross-species). The x-axis represents the time of speciation between human and NONCODE species as given in (69). Species abbreviations are the following: Atha: Arabidopsis; Btau: Cow; Cele: Nematode; Dmel: Fly; Drer: Zebrafish; Ggal: Chicken; Ggor: Gorilla; Hsap: Human; Mdom: Opossum; Mmul: Rhesus; Mmus: Mouse; Oana: Platypus; Pabe: Orangutan; Ptro: Chimpanzee; Rnor: Rat.

Similar articles

See all similar articles

Cited by 62 articles

See all "Cited by" articles

References

    1. Djebali S., Davis C.A., Merkel A., Dobin A., Lassmann T., Mortazavi A., Tanzer A., Lagarde J., Lin W., Schlesinger F. et al. Landscape of transcription in human cells. Nature. 2012; 489:101–108. - PMC - PubMed
    1. Pervouchine D.D., Djebali S., Breschi A., Davis C.A., Barja P.P., Dobin A., Tanzer A., Lagarde J., Zaleski C., See L.-H. et al. Enhanced transcriptome maps from multiple mouse tissues reveal evolutionary constraint in gene expression. Nat. Commun. 2015; 6:1–11. - PMC - PubMed
    1. Carninci P., Kasukawa T., Katayama S., Gough J., Frith M.C., Maeda N., Oyama R., Ravasi T., Lenhard B., Wells C. et al. The transcriptional landscape of the mammalian genome. Science. 2005; 309:1559–1563. - PubMed
    1. Brown J.B., Boley N., Eisman R., May G.E., Stoiber M.H., Duff M.O., Booth B.W., Wen J., Park S., Suzuki A.M. et al. Diversity and dynamics of the Drosophila transcriptome. Nature. 2014; 512:393–399. - PMC - PubMed
    1. Legeai F., Derrien T. Identification of long non-coding RNAs in insects genomes. Curr. Opin. Insect Sci. 2015; 7:37–44.

Publication types

MeSH terms

Feedback