Meraculous: de novo genome assembly with short paired-end reads
- PMID: 21876754
- PMCID: PMC3158087
- DOI: 10.1371/journal.pone.0023501
Meraculous: de novo genome assembly with short paired-end reads
Abstract
We describe a new algorithm, meraculous, for whole genome assembly of deep paired-end short reads, and apply it to the assembly of a dataset of paired 75-bp Illumina reads derived from the 15.4 megabase genome of the haploid yeast Pichia stipitis. More than 95% of the genome is recovered, with no errors; half the assembled sequence is in contigs longer than 101 kilobases and in scaffolds longer than 269 kilobases. Incorporating fosmid ends recovers entire chromosomes. Meraculous relies on an efficient and conservative traversal of the subgraph of the k-mer (deBruijn) graph of oligonucleotides with unique high quality extensions in the dataset, avoiding an explicit error correction step as used in other short-read assemblers. A novel memory-efficient hashing scheme is introduced. The resulting contigs are ordered and oriented using paired reads separated by ∼280 bp or ∼3.2 kbp, and many gaps between contigs can be closed using paired-end placements. Practical issues with the dataset are described, and prospects for assembling larger genomes are discussed.
Conflict of interest statement
Figures
Similar articles
-
Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):507. doi: 10.1186/s12864-016-2895-8. BMC Genomics. 2016. PMID: 27556636 Free PMC article.
-
Fine de novo sequencing of a fungal genome using only SOLiD short read data: verification on Aspergillus oryzae RIB40.PLoS One. 2013 May 7;8(5):e63673. doi: 10.1371/journal.pone.0063673. Print 2013. PLoS One. 2013. PMID: 23667655 Free PMC article.
-
Illumina error correction near highly repetitive DNA regions improves de novo genome assembly.BMC Bioinformatics. 2019 Jun 3;20(1):298. doi: 10.1186/s12859-019-2906-2. BMC Bioinformatics. 2019. PMID: 31159722 Free PMC article.
-
Sequence assembly using next generation sequencing data--challenges and solutions.Sci China Life Sci. 2014 Nov;57(11):1140-8. doi: 10.1007/s11427-014-4752-9. Epub 2014 Oct 17. Sci China Life Sci. 2014. PMID: 25326069 Review.
-
De novo assembly of short sequence reads.Brief Bioinform. 2010 Sep;11(5):457-72. doi: 10.1093/bib/bbq020. Epub 2010 Aug 19. Brief Bioinform. 2010. PMID: 20724458 Review.
Cited by
-
Identification of optimum sequencing depth especially for de novo genome assembly of small genomes using next generation sequencing data.PLoS One. 2013 Apr 12;8(4):e60204. doi: 10.1371/journal.pone.0060204. Print 2013. PLoS One. 2013. PMID: 23593174 Free PMC article.
-
A chromosome-level reference genome for the common octopus, Octopus vulgaris (Cuvier, 1797).G3 (Bethesda). 2023 Dec 6;13(12):jkad220. doi: 10.1093/g3journal/jkad220. G3 (Bethesda). 2023. PMID: 37850903 Free PMC article.
-
Improved genome assembly of American alligator genome reveals conserved architecture of estrogen signaling.Genome Res. 2017 May;27(5):686-696. doi: 10.1101/gr.213595.116. Epub 2017 Jan 30. Genome Res. 2017. PMID: 28137821 Free PMC article.
-
Differential expression analyses reveal extensive transcriptional plasticity induced by temperature in New Zealand silver trevally (Pseudocaranx georgianus).Evol Appl. 2022 Jan 22;15(2):237-248. doi: 10.1111/eva.13332. eCollection 2022 Feb. Evol Appl. 2022. PMID: 35233245 Free PMC article.
-
An improved reference genome for Trifolium subterraneum L. provides insight into molecular diversity and intra-specific phylogeny.Front Plant Sci. 2023 Feb 15;14:1103857. doi: 10.3389/fpls.2023.1103857. eCollection 2023. Front Plant Sci. 2023. PMID: 36875612 Free PMC article.
References
-
- Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010;11:31–46. - PubMed
-
- Bentley DR. Whole-genome re-sequencing. Curr Opin Genet Dev. 2006;16:545–552. - PubMed
-
- Wheeler DA, Srinivasan M, Egholm M, Shen Y, Chen L, et al. The complete genome of an individual by massively parallel DNA sequencing. Nature. 2008;452:872–876. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous
