Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Apr 19;13 Suppl 6(Suppl 6):S1.
doi: 10.1186/1471-2105-13-S6-S1.

Exploiting Sparseness in De Novo Genome Assembly

Affiliations
Free PMC article

Exploiting Sparseness in De Novo Genome Assembly

Chengxi Ye et al. BMC Bioinformatics. .
Free PMC article

Abstract

Background: The very large memory requirements for the construction of assembly graphs for de novo genome assembly limit current algorithms to super-computing environments.

Methods: In this paper, we demonstrate that constructing a sparse assembly graph which stores only a small fraction of the observed k-mers as nodes and the links between these nodes allows the de novo assembly of even moderately-sized genomes (~500 M) on a typical laptop computer.

Results: We implement this sparse graph concept in a proof-of-principle software package, SparseAssembler, utilizing a new sparse k-mer graph structure evolved from the de Bruijn graph. We test our SparseAssembler with both simulated and real data, achieving ~90% memory savings and retaining high assembly accuracy, without sacrificing speed in comparison to existing de novo assemblers.

Figures

Figure 1
Figure 1
From overlap graph to a string graph. (a) an overlap graph, in which all the overlaps are recorded. (b) the string graph, transitive overlap (a, c) is removed.
Figure 2
Figure 2
A node with branches in the de Bruijn graph and the sparse k-mer graph. (a) A node with branches in a de Bruijn graph. (b) The binary implementation of (a). (c) A node with branches in a sparse k-mer graph. (d) The binary implementation of (c). The k-mers which are nodes in the graph are squared in the blocks. Neighbouring nucleotides indicating the edges of the graph are circled.
Figure 3
Figure 3
Breadth-first search bubble removal in the sparse k-mer graph. Removing unwanted structures in the sparse de Bruijn graph. (a) Before removal. (b) After removal.

Similar articles

See all similar articles

Cited by 44 articles

See all "Cited by" articles

References

    1. Pop M, Salzberg SL. Bioinformatics challenges of new sequencing technology. Trends Genet. 2008;24(3):142–149. doi: 10.1016/j.tig.2007.12.006. - DOI - PMC - PubMed
    1. Myers EW, Sutton GG, Delcher AL, Dew IM, Fasulo DP, Flanigan MJ, Kravitz SA, Mobarry CM, Reinert KHJ, Remington KA. et al. A whole-genome assembly of Drosophila. Science. 2000;287(5461):2196–2204. doi: 10.1126/science.287.5461.2196. - DOI - PubMed
    1. Batzoglou S, Jaffe DB, Stanley K, Butler J, Gnerre S, Mauceli E, Berger B, Mesirov JP, Lander ES. ARACHNE: A whole-genome shotgun assembler. Genome Res. 2002;12(1):177–189. doi: 10.1101/gr.208902. - DOI - PMC - PubMed
    1. Mullikin JC, Ning ZM. The phusion assembler. Genome Res. 2003;13(1):81–90. doi: 10.1101/gr.731003. - DOI - PMC - PubMed
    1. Havlak P, Chen R, Durbin KJ, Egan A, Ren YR, Song XZ, Weinstock GM, Gibbs RA. The atlas genome assembly system. Genome Res. 2004;14(4):721–732. doi: 10.1101/gr.2264004. - DOI - PMC - PubMed

Publication types

LinkOut - more resources

Feedback