Discrimination of non-protein-coding transcripts from protein-coding mRNA
- PMID: 17114936
- DOI: 10.4161/rna.3.1.2789
Discrimination of non-protein-coding transcripts from protein-coding mRNA
Abstract
Several recent studies indicate that mammals and other organisms produce large numbers of RNA transcripts that do not correspond to known genes. It has been suggested that these transcripts do not encode proteins, but may instead function as RNAs. However, discrimination of coding and non-coding transcripts is not straightforward, and different laboratories have used different methods, whose ability to perform this discrimination is unclear. In this study, we examine ten bioinformatic methods that assess protein-coding potential and compare their ability and congruency in the discrimination of non-coding from coding sequences, based on four underlying principles: open reading frame size, sequence similarity to known proteins or protein domains, statistical models of protein-coding sequence, and synonymous versus non-synonymous substitution rates. Despite these different approaches, the methods show broad concordance, suggesting that coding and non-coding transcripts can, in general, be reliably discriminated, and that many of the recently discovered extra-genic transcripts are indeed non-coding. Comparison of the methods indicates reasons for unreliable predictions, and approaches to increase confidence further. Conversely and surprisingly, our analyses also provide evidence that as much as approximately 10% of entries in the manually curated protein database Swiss-Prot are erroneous translations of actually non-coding transcripts.
Similar articles
-
Differentiating protein-coding and noncoding RNA: challenges and ambiguities.PLoS Comput Biol. 2008 Nov;4(11):e1000176. doi: 10.1371/journal.pcbi.1000176. Epub 2008 Nov 28. PLoS Comput Biol. 2008. PMID: 19043537 Free PMC article. Review.
-
Characterization of 43 non-protein-coding mRNA genes in Arabidopsis, including the MIR162a-derived transcripts.Plant Physiol. 2006 Apr;140(4):1192-204. doi: 10.1104/pp.105.073817. Epub 2006 Feb 24. Plant Physiol. 2006. PMID: 16500993 Free PMC article.
-
A Support Vector Machine based method to distinguish long non-coding RNAs from protein coding transcripts.BMC Genomics. 2017 Oct 18;18(1):804. doi: 10.1186/s12864-017-4178-4. BMC Genomics. 2017. PMID: 29047334 Free PMC article.
-
Identification and expression analysis of putative mRNA-like non-coding RNA in Drosophila.Genes Cells. 2005 Dec;10(12):1163-73. doi: 10.1111/j.1365-2443.2005.00910.x. Genes Cells. 2005. PMID: 16324153
-
Coding vs non-coding: Translatability of short ORFs found in putative non-coding transcripts.Biochimie. 2011 Nov;93(11):1981-6. doi: 10.1016/j.biochi.2011.06.024. Epub 2011 Jun 26. Biochimie. 2011. PMID: 21729735 Review.
Cited by
-
LncRNA 4930581F22Rik promotes myogenic differentiation by regulating the ERK/MAPK signaling pathway.Heliyon. 2024 May 6;10(9):e30640. doi: 10.1016/j.heliyon.2024.e30640. eCollection 2024 May 15. Heliyon. 2024. PMID: 38774102 Free PMC article.
-
csORF-finder: an effective ensemble learning framework for accurate identification of multi-species coding short open reading frames.Brief Bioinform. 2022 Nov 19;23(6):bbac392. doi: 10.1093/bib/bbac392. Brief Bioinform. 2022. PMID: 36094083 Free PMC article.
-
Common Features in lncRNA Annotation and Classification: A Survey.Noncoding RNA. 2021 Dec 13;7(4):77. doi: 10.3390/ncrna7040077. Noncoding RNA. 2021. PMID: 34940758 Free PMC article. Review.
-
AI applications in functional genomics.Comput Struct Biotechnol J. 2021 Oct 11;19:5762-5790. doi: 10.1016/j.csbj.2021.10.009. eCollection 2021. Comput Struct Biotechnol J. 2021. PMID: 34765093 Free PMC article. Review.
-
Systematic and computational identification of Androctonus crassicauda long non-coding RNAs.Sci Rep. 2021 Feb 25;11(1):4720. doi: 10.1038/s41598-021-83815-8. Sci Rep. 2021. PMID: 33633149 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources