De novo sequencing and variant calling with nanopores using PoreSeq

Nat Biotechnol. 2015 Oct;33(10):1087-91. doi: 10.1038/nbt.3360. Epub 2015 Sep 9.

Abstract

The accuracy of sequencing single DNA molecules with nanopores is continually improving, but de novo genome sequencing and assembly using only nanopore data remain challenging. Here we describe PoreSeq, an algorithm that identifies and corrects errors in nanopore sequencing data and improves the accuracy of de novo genome assembly with increasing coverage depth. The approach relies on modeling the possible sources of uncertainty that occur as DNA transits through the nanopore and finds the sequence that best explains multiple reads of the same region. PoreSeq increases nanopore sequencing read accuracy of M13 bacteriophage DNA from 85% to 99% at 100× coverage. We also use the algorithm to assemble Escherichia coli with 30× coverage and the λ genome at a range of coverages from 3× to 50×. Additionally, we classify sequence variants at an order of magnitude lower coverage than is possible with existing methods.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Base Sequence
  • Chromosome Mapping / methods*
  • DNA, Viral / genetics*
  • DNA, Viral / ultrastructure
  • Genetic Variation / genetics*
  • Molecular Sequence Data
  • Nanopores / ultrastructure*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sequence Analysis, DNA / methods*

Substances

  • DNA, Viral