Distilled single-cell genome sequencing and de novo assembly for sparse microbial communities

Bioinformatics. 2013 Oct 1;29(19):2395-401. doi: 10.1093/bioinformatics/btt420. Epub 2013 Aug 5.


Motivation: Identification of every single genome present in a microbial sample is an important and challenging task with crucial applications. It is challenging because there are typically millions of cells in a microbial sample, the vast majority of which elude cultivation. The most accurate method to date is exhaustive single-cell sequencing using multiple displacement amplification, which is simply intractable for a large number of cells. However, there is hope for breaking this barrier, as the number of different cell types with distinct genome sequences is usually much smaller than the number of cells.

Results: Here, we present a novel divide and conquer method to sequence and de novo assemble all distinct genomes present in a microbial sample with a sequencing cost and computational complexity proportional to the number of genome types, rather than the number of cells. The method is implemented in a tool called Squeezambler. We evaluated Squeezambler on simulated data. The proposed divide and conquer method successfully reduces the cost of sequencing in comparison with the naïve exhaustive approach.

Availability: Squeezambler and datasets are available at http://compbio.cs.wayne.edu/software/squeezambler/.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Base Sequence
  • Genome, Microbial*
  • Humans
  • Intestines / microbiology
  • Sequence Analysis, DNA / methods*
  • Sequence Homology, Nucleic Acid