TopHat: discovering splice junctions with RNA-Seq
- PMID: 19289445
- PMCID: PMC2672628
- DOI: 10.1093/bioinformatics/btp120
TopHat: discovering splice junctions with RNA-Seq
Abstract
Motivation: A new protocol for sequencing the messenger RNA in a cell, known as RNA-Seq, generates millions of short sequence fragments in a single run. These fragments, or 'reads', can be used to measure levels of gene expression and to identify novel splice variants of genes. However, current software for aligning RNA-Seq data to a genome relies on known splice junctions and cannot identify novel ones. TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites.
Results: We mapped the RNA-Seq reads from a recent mammalian RNA-Seq experiment and recovered more than 72% of the splice junctions reported by the annotation-based software from that study, along with nearly 20,000 previously unreported junctions. The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer. We describe several challenges unique to ab initio splice site discovery from RNA-Seq reads that will require further algorithm development.
Availability: TopHat is free, open-source software available from http://tophat.cbcb.umd.edu.
Supplementary information: Supplementary data are available at Bioinformatics online.
Figures
Similar articles
-
Read-Split-Run: an improved bioinformatics pipeline for identification of genome-wide non-canonical spliced regions using RNA-Seq data.BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):503. doi: 10.1186/s12864-016-2896-7. BMC Genomics. 2016. PMID: 27556805 Free PMC article.
-
MapSplice: accurate mapping of RNA-seq reads for splice junction discovery.Nucleic Acids Res. 2010 Oct;38(18):e178. doi: 10.1093/nar/gkq622. Epub 2010 Aug 27. Nucleic Acids Res. 2010. PMID: 20802226 Free PMC article.
-
Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM).Bioinformatics. 2011 Sep 15;27(18):2518-28. doi: 10.1093/bioinformatics/btr427. Epub 2011 Jul 19. Bioinformatics. 2011. PMID: 21775302 Free PMC article.
-
Mapping RNA-seq Reads with STAR.Curr Protoc Bioinformatics. 2015 Sep 3;51:11.14.1-11.14.19. doi: 10.1002/0471250953.bi1114s51. Curr Protoc Bioinformatics. 2015. PMID: 26334920 Free PMC article. Review.
-
Mapping RNA-seq reads to transcriptomes efficiently based on learning to hash method.Comput Biol Med. 2020 Jan;116:103539. doi: 10.1016/j.compbiomed.2019.103539. Epub 2019 Nov 13. Comput Biol Med. 2020. PMID: 31765913 Review.
Cited by 6,021 articles
-
HOXBLINC long non-coding RNA activation promotes leukemogenesis in NPM1-mutant acute myeloid leukemia.Nat Commun. 2021 Mar 29;12(1):1956. doi: 10.1038/s41467-021-22095-2. Nat Commun. 2021. PMID: 33782403
-
A Dual Systems Genetics Approach Identifies Common Genes, Networks, and Pathways for Type 1 and 2 Diabetes in Human Islets.Front Genet. 2021 Mar 10;12:630109. doi: 10.3389/fgene.2021.630109. eCollection 2021. Front Genet. 2021. PMID: 33777101 Free PMC article.
-
In Vitro Culture Expansion Shifts the Immune Phenotype of Human Adipose-Derived Mesenchymal Stem Cells.Front Immunol. 2021 Mar 10;12:621744. doi: 10.3389/fimmu.2021.621744. eCollection 2021. Front Immunol. 2021. PMID: 33777002 Free PMC article.
-
The oncogene AAMDC links PI3K-AKT-mTOR signaling with metabolic reprograming in estrogen receptor-positive breast cancer.Nat Commun. 2021 Mar 26;12(1):1920. doi: 10.1038/s41467-021-22101-7. Nat Commun. 2021. PMID: 33772001
-
Methylation and molecular profiles of ependymoma: Influence of patient age and tumor anatomic location.Mol Clin Oncol. 2021 May;14(5):88. doi: 10.3892/mco.2021.2250. Epub 2021 Mar 5. Mol Clin Oncol. 2021. PMID: 33767857 Free PMC article.
References
-
- Abouelhoda M, et al. Replacing suffix trees with enhanced suffix arrays. J. Discrete Alg. 2004;2:53–86.
-
- Adams MD, et al. Rapid cDNA sequencing (expressed sequence tags) from a directionally cloned human infant brain cDNA library. Nat. Genet. 1993;4:373–380. - PubMed
-
- Burrows M, Wheeler D. Technical Report 124. Palo Alto, California: DEC, Digital Systems Research Center; 1994. A block sorting lossless data compression algorithm.
-
- Cloonan N, et al. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat. Meth. 2008;5:613–619. - PubMed
-
- De Bona F, et al. Optimal spliced alignments of short sequence reads. Bioinformatics. 2008;24:i174–i180. - PubMed
Publication types
MeSH terms
Substances
Grant support
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical
