Exploiting sparseness in de novo genome assembly

BMC Bioinformatics. 2012 Apr 19;13 Suppl 6(Suppl 6):S1. doi: 10.1186/1471-2105-13-S6-S1.

Abstract

Background: The very large memory requirements for the construction of assembly graphs for de novo genome assembly limit current algorithms to super-computing environments.

Methods: In this paper, we demonstrate that constructing a sparse assembly graph which stores only a small fraction of the observed k-mers as nodes and the links between these nodes allows the de novo assembly of even moderately-sized genomes (~500 M) on a typical laptop computer.

Results: We implement this sparse graph concept in a proof-of-principle software package, SparseAssembler, utilizing a new sparse k-mer graph structure evolved from the de Bruijn graph. We test our SparseAssembler with both simulated and real data, achieving ~90% memory savings and retaining high assembly accuracy, without sacrificing speed in comparison to existing de novo assemblers.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Computer Storage Devices*
  • Escherichia coli / genetics
  • Genome*
  • Genome, Human
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Sequence Analysis, DNA
  • Software*