Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data

Nat Methods. 2013 Jun;10(6):563-9. doi: 10.1038/nmeth.2474. Epub 2013 May 5.

Abstract

We present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graph-based consensus procedure, which we follow with assembly using off-the-shelf long-read assemblers. In contrast to hybrid approaches, HGAP does not require highly accurate raw reads for error correction. We demonstrate efficient genome assembly for several microorganisms using as few as three SMRT Cell zero-mode waveguide arrays of sequencing and for BACs using just one SMRT Cell. Long repeat regions can be successfully resolved with this workflow. We also describe a consensus algorithm that incorporates SMRT sequencing primary quality values to produce de novo genome sequence exceeding 99.999% accuracy.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Chromosomes, Artificial, Bacterial
  • Escherichia coli / genetics
  • Gene Library
  • Genome, Bacterial*
  • Humans
  • Repetitive Sequences, Nucleic Acid
  • Sequence Analysis, DNA / methods*

Associated data

  • PDB/SRR811719
  • PDB/SRR811720
  • PDB/SRR811743
  • PDB/SRR811744
  • PDB/SRR811745
  • PDB/SRR811746
  • PDB/SRR811747
  • PDB/SRR811770
  • PDB/SRR811863
  • PDB/SRR811864
  • PDB/SRR811865
  • PDB/SRR811890
  • PDB/SRR811935
  • PDB/SRR811936
  • PDB/SRR811937
  • PDB/SRR811960
  • PDB/SRR811961
  • PDB/SRR811962
  • PDB/SRR811963
  • PDB/SRR812176
  • PDB/SRR812197