NxRepair: error correction in de novo sequence assembly using Nextera mate pairs

PeerJ. 2015 Jun 2;3:e996. doi: 10.7717/peerj.996. eCollection 2015.

Abstract

Scaffolding errors and incorrect repeat disambiguation during de novo assembly can result in large scale misassemblies in draft genomes. Nextera mate pair sequencing data provide additional information to resolve assembly ambiguities during scaffolding. Here, we introduce NxRepair, an open source toolkit for error correction in de novo assemblies that uses Nextera mate pair libraries to identify and correct large-scale errors. We show that NxRepair can identify and correct large scaffolding errors, without use of a reference sequence, resulting in quantitative improvements in the assembly quality. NxRepair can be downloaded from GitHub or PyPI, the Python Package Index; a tutorial and user documentation are also available.

Keywords: Assembly quality; Automated error detection; De novo assembly; Error correction; Genome assembly; Insert size; Mate pair; Misassembly; Misassembly detection; Scaffolding.

Grant support

RRM is a BBSRC Ph.D. student. This work was completed during a paid internship at Illumina. Jared O’Connell, Anthony J. Cox and Ole Schulz-Trieglaff are permanent employees of Illumina Inc. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.