REAPR: a universal tool for genome assembly evaluation

Genome Biol. 2013 May 27;14(5):R47. doi: 10.1186/gb-2013-14-5-r47.

Abstract

Methods to reliably assess the accuracy of genome sequence data are lacking. Currently completeness is only described qualitatively and mis-assemblies are overlooked. Here we present REAPR, a tool that precisely identifies errors in genome assemblies without the need for a reference sequence. We have validated REAPR on complete genomes or de novo assemblies from bacteria, malaria and Caenorhabditis elegans, and demonstrate that 86% and 82% of the human and mouse reference genomes are error-free, respectively. When applied to an ongoing genome project, REAPR provides corrected assembly statistics allowing the quantitative comparison of multiple assemblies. REAPR is available at http://www.sanger.ac.uk/resources/software/reapr/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Animals
  • Bacteria / genetics
  • Caenorhabditis elegans / genetics
  • Genome*
  • Genomics / methods*
  • Humans
  • Models, Statistical
  • Sequence Analysis, DNA