CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model
- PMID: 23335781
- PMCID: PMC3616698
- DOI: 10.1093/nar/gkt006
CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model
Abstract
Thousands of novel transcripts have been identified using deep transcriptome sequencing. This discovery of large and 'hidden' transcriptome rejuvenates the demand for methods that can rapidly distinguish between coding and noncoding RNA. Here, we present a novel alignment-free method, Coding Potential Assessment Tool (CPAT), which rapidly recognizes coding and noncoding transcripts from a large pool of candidates. To this end, CPAT uses a logistic regression model built with four sequence features: open reading frame size, open reading frame coverage, Fickett TESTCODE statistic and hexamer usage bias. CPAT software outperformed (sensitivity: 0.96, specificity: 0.97) other state-of-the-art alignment-based software such as Coding-Potential Calculator (sensitivity: 0.99, specificity: 0.74) and Phylo Codon Substitution Frequencies (sensitivity: 0.90, specificity: 0.63). In addition to high accuracy, CPAT is approximately four orders of magnitude faster than Coding-Potential Calculator and Phylo Codon Substitution Frequencies, enabling its users to process thousands of transcripts within seconds. The software accepts input sequences in either FASTA- or BED-formatted data files. We also developed a web interface for CPAT that allows users to submit sequences and receive the prediction results almost instantly.
Figures
Similar articles
-
RNA Coding Potential Prediction Using Alignment-Free Logistic Regression Model.Methods Mol Biol. 2021;2254:27-39. doi: 10.1007/978-1-0716-1158-6_3. Methods Mol Biol. 2021. PMID: 33326068
-
A Support Vector Machine based method to distinguish long non-coding RNAs from protein coding transcripts.BMC Genomics. 2017 Oct 18;18(1):804. doi: 10.1186/s12864-017-4178-4. BMC Genomics. 2017. PMID: 29047334 Free PMC article.
-
lncScore: alignment-free identification of long noncoding RNA from assembled novel transcripts.Sci Rep. 2016 Oct 6;6:34838. doi: 10.1038/srep34838. Sci Rep. 2016. PMID: 27708423 Free PMC article.
-
Differentiating protein-coding and noncoding RNA: challenges and ambiguities.PLoS Comput Biol. 2008 Nov;4(11):e1000176. doi: 10.1371/journal.pcbi.1000176. Epub 2008 Nov 28. PLoS Comput Biol. 2008. PMID: 19043537 Free PMC article. Review.
-
Employment opportunities for non-coding RNAs.FEBS Lett. 2004 Jun 1;567(1):27-34. doi: 10.1016/j.febslet.2004.03.117. FEBS Lett. 2004. PMID: 15165889 Review.
Cited by
-
Long non-coding RNA LINC01137 contributes to oral squamous cell carcinoma development and is negatively regulated by miR-22-3p.Cell Oncol (Dordr). 2021 Jun;44(3):595-609. doi: 10.1007/s13402-021-00586-0. Epub 2021 Apr 2. Cell Oncol (Dordr). 2021. PMID: 33797737
-
In vivo partial reprogramming by bacteria promotes adult liver organ growth without fibrosis and tumorigenesis.Cell Rep Med. 2022 Nov 15;3(11):100820. doi: 10.1016/j.xcrm.2022.100820. Cell Rep Med. 2022. PMID: 36384103 Free PMC article.
-
lncRNA expression in the auditory forebrain during postnatal development.Gene. 2016 Nov 15;593(1):201-216. doi: 10.1016/j.gene.2016.08.027. Epub 2016 Aug 18. Gene. 2016. PMID: 27544636 Free PMC article.
-
Full-Length Transcriptome and Gene Expression Analysis of Different Ovis aries Adipose Tissues Reveals Transcript Variants Involved in Lipid Biosynthesis.Animals (Basel). 2023 Dec 19;14(1):7. doi: 10.3390/ani14010007. Animals (Basel). 2023. PMID: 38200738 Free PMC article.
-
Full-Length Transcriptome Sequencing Provides Insights into Flavonoid Biosynthesis in Fritillaria hupehensis.Life (Basel). 2021 Mar 28;11(4):287. doi: 10.3390/life11040287. Life (Basel). 2021. PMID: 33800612 Free PMC article.
References
-
- Bertone P, Stolc V, Royce TE, Rozowsky JS, Urban AE, Zhu X, Rinn JL, Tongprasit W, Samanta M, Weissman S, et al. Global identification of human transcribed sequences with genome tiling arrays. Science (New York, NY) 2004;306:2242–2246. - PubMed
-
- Kapranov P, Cheng J, Dike S, Nix DA, Duttagupta R, Willingham AT, Stadler PF, Hertel J, Hackermüller J, Hofacker IL, et al. RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science (New York, NY) 2007;316:1484–1488. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
