OLego: fast and sensitive mapping of spliced mRNA-Seq reads using small seeds
- PMID: 23571760
- PMCID: PMC3664805
- DOI: 10.1093/nar/gkt216
OLego: fast and sensitive mapping of spliced mRNA-Seq reads using small seeds
Abstract
A crucial step in analyzing mRNA-Seq data is to accurately and efficiently map hundreds of millions of reads to the reference genome and exon junctions. Here we present OLego, an algorithm specifically designed for de novo mapping of spliced mRNA-Seq reads. OLego adopts a multiple-seed-and-extend scheme, and does not rely on a separate external aligner. It achieves high sensitivity of junction detection by strategic searches with small seeds (~14 nt for mammalian genomes). To improve accuracy and resolve ambiguous mapping at junctions, OLego uses a built-in statistical model to score exon junctions by splice-site strength and intron size. Burrows-Wheeler transform is used in multiple steps of the algorithm to efficiently map seeds, locate junctions and identify small exons. OLego is implemented in C++ with fully multithreaded execution, and allows fast processing of large-scale data. We systematically evaluated the performance of OLego in comparison with published tools using both simulated and real data. OLego demonstrated better sensitivity, higher or comparable accuracy and substantially improved speed. OLego also identified hundreds of novel micro-exons (<30 nt) in the mouse transcriptome, many of which are phylogenetically conserved and can be validated experimentally in vivo. OLego is freely available at http://zhanglab.c2b2.columbia.edu/index.php/OLego.
Figures
Similar articles
-
JAGuaR: junction alignments to genome for RNA-seq reads.PLoS One. 2014 Jul 25;9(7):e102398. doi: 10.1371/journal.pone.0102398. eCollection 2014. PLoS One. 2014. PMID: 25062255 Free PMC article.
-
Supersplat--spliced RNA-seq alignment.Bioinformatics. 2010 Jun 15;26(12):1500-5. doi: 10.1093/bioinformatics/btq206. Epub 2010 Apr 21. Bioinformatics. 2010. PMID: 20410051 Free PMC article.
-
Systematic evaluation of spliced alignment programs for RNA-seq data.Nat Methods. 2013 Dec;10(12):1185-91. doi: 10.1038/nmeth.2722. Epub 2013 Nov 3. Nat Methods. 2013. PMID: 24185836 Free PMC article.
-
Mapping RNA-seq Reads with STAR.Curr Protoc Bioinformatics. 2015 Sep 3;51:11.14.1-11.14.19. doi: 10.1002/0471250953.bi1114s51. Curr Protoc Bioinformatics. 2015. PMID: 26334920 Free PMC article. Review.
-
Overview of available methods for diverse RNA-Seq data analyses.Sci China Life Sci. 2011 Dec;54(12):1121-8. doi: 10.1007/s11427-011-4255-x. Epub 2012 Jan 7. Sci China Life Sci. 2011. PMID: 22227904 Review.
Cited by
-
Lessons from non-canonical splicing.Nat Rev Genet. 2016 Jul;17(7):407-421. doi: 10.1038/nrg.2016.46. Epub 2016 May 31. Nat Rev Genet. 2016. PMID: 27240813 Free PMC article. Review.
-
Isolation and characterization of human embryonic stem cell-derived heart field-specific cardiomyocytes unravels new insights into their transcriptional and electrophysiological profiles.Cardiovasc Res. 2022 Feb 21;118(3):828-843. doi: 10.1093/cvr/cvab102. Cardiovasc Res. 2022. PMID: 33744937 Free PMC article.
-
Isoform-specific functions of an evolutionarily conserved 3 bp micro-exon alternatively spliced from another exon in Drosophila homothorax gene.Sci Rep. 2020 Jul 30;10(1):12783. doi: 10.1038/s41598-020-69644-1. Sci Rep. 2020. PMID: 32732884 Free PMC article.
-
The impact of RNA-seq aligners on gene expression estimation.ACM BCB. 2015 Sep;2015:462-471. doi: 10.1145/2808719.2808767. ACM BCB. 2015. PMID: 27583310 Free PMC article.
-
An RNA Switch of a Large Exon of Ninein Is Regulated by the Neural Stem Cell Specific-RNA Binding Protein, Qki5.Int J Mol Sci. 2019 Feb 26;20(5):1010. doi: 10.3390/ijms20051010. Int J Mol Sci. 2019. PMID: 30813567 Free PMC article.
References
-
- Black DL. Mechanisms of alternative pre-messenger RNA splicing. Annu. Rev. Biochem. 2003;72:291–336. - PubMed
-
- Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet. 2008;40:1413–1415. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
Research Materials
