Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study
- PMID: 22373417
- PMCID: PMC3287467
- DOI: 10.1186/1471-2105-12-S14-S2
Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study
Abstract
Background: With the fast advances in nextgen sequencing technology, high-throughput RNA sequencing has emerged as a powerful and cost-effective way for transcriptome study. De novo assembly of transcripts provides an important solution to transcriptome analysis for organisms with no reference genome. However, there lacked understanding on how the different variables affected assembly outcomes, and there was no consensus on how to approach an optimal solution by selecting software tool and suitable strategy based on the properties of RNA-Seq data.
Results: To reveal the performance of different programs for transcriptome assembly, this work analyzed some important factors, including k-mer values, genome complexity, coverage depth, directional reads, etc. Seven program conditions, four single k-mer assemblers (SK: SOAPdenovo, ABySS, Oases and Trinity) and three multiple k-mer methods (MK: SOAPdenovo-MK, trans-ABySS and Oases-MK) were tested. While small and large k-mer values performed better for reconstructing lowly and highly expressed transcripts, respectively, MK strategy worked well for almost all ranges of expression quintiles. Among SK tools, Trinity performed well across various conditions but took the longest running time. Oases consumed the most memory whereas SOAPdenovo required the shortest runtime but worked poorly to reconstruct full-length CDS. ABySS showed some good balance between resource usage and quality of assemblies.
Conclusions: Our work compared the performance of publicly available transcriptome assemblers, and analyzed important factors affecting de novo assembly. Some practical guidelines for transcript reconstruction from short-read RNA-Seq data were proposed. De novo assembly of C. sinensis transcriptome was greatly improved using some optimized methods.
Figures
Similar articles
-
Comparison of De Novo Transcriptome Assemblers and k-mer Strategies Using the Killifish, Fundulus heteroclitus.PLoS One. 2016 Apr 7;11(4):e0153104. doi: 10.1371/journal.pone.0153104. eCollection 2016. PLoS One. 2016. PMID: 27054874 Free PMC article.
-
Comprehensive evaluation of de novo transcriptome assembly programs and their effects on differential gene expression analysis.Bioinformatics. 2017 Feb 1;33(3):327-333. doi: 10.1093/bioinformatics/btw625. Bioinformatics. 2017. PMID: 28172640
-
Optimizing de novo assembly of short-read RNA-seq data for phylogenomics.BMC Genomics. 2013 May 14;14:328. doi: 10.1186/1471-2164-14-328. BMC Genomics. 2013. PMID: 23672450 Free PMC article.
-
Current and Future Methods for mRNA Analysis: A Drive Toward Single Molecule Sequencing.Methods Mol Biol. 2018;1783:209-241. doi: 10.1007/978-1-4939-7834-2_11. Methods Mol Biol. 2018. PMID: 29767365 Review.
-
A simple guide to de novo transcriptome assembly and annotation.Brief Bioinform. 2022 Mar 10;23(2):bbab563. doi: 10.1093/bib/bbab563. Brief Bioinform. 2022. PMID: 35076693 Free PMC article. Review.
Cited by
-
De Novo Assembly and Characterization of Fruit Transcriptome in Black Pepper (Piper nigrum).PLoS One. 2015 Jun 29;10(6):e0129822. doi: 10.1371/journal.pone.0129822. eCollection 2015. PLoS One. 2015. PMID: 26121657 Free PMC article.
-
Comparative transcriptomics uncovers alternative splicing changes and signatures of selection from maize improvement.BMC Genomics. 2015 May 8;16(1):363. doi: 10.1186/s12864-015-1582-5. BMC Genomics. 2015. PMID: 25952680 Free PMC article.
-
Sexual differences in the sialomes of the zebra tick, Rhipicephalus pulchellus.J Proteomics. 2015 Mar 18;117:120-44. doi: 10.1016/j.jprot.2014.12.014. Epub 2015 Jan 7. J Proteomics. 2015. PMID: 25576852 Free PMC article.
-
Complete Genome Sequence of a Blochmannia Endosymbiont of Colobopsis nipponica.Microbiol Resour Announc. 2021 Apr 29;10(17):e01195-20. doi: 10.1128/MRA.01195-20. Microbiol Resour Announc. 2021. PMID: 33927044 Free PMC article.
-
Optimized deep-targeted proteotranscriptomic profiling reveals unexplored Conus toxin diversity and novel cysteine frameworks.Proc Natl Acad Sci U S A. 2015 Jul 21;112(29):E3782-91. doi: 10.1073/pnas.1501334112. Epub 2015 Jul 6. Proc Natl Acad Sci U S A. 2015. PMID: 26150494 Free PMC article.
References
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous
