SCARPA: scaffolding reads with practical algorithms

Bioinformatics. 2013 Feb 15;29(4):428-34. doi: 10.1093/bioinformatics/bts716. Epub 2012 Dec 29.

Abstract

Motivation: Scaffolding is the process of ordering and orienting contigs produced during genome assembly. Accurate scaffolding is essential for finishing draft assemblies, as it facilitates the costly and laborious procedures needed to fill in the gaps between contigs. Conventional formulations of the scaffolding problem are intractable, and most scaffolding programs rely on heuristic or approximate solutions, with potentially exponential running time.

Results: We present SCARPA, a novel scaffolder, which combines fixed-parameter tractable and bounded algorithms with Linear Programming to produce near-optimal scaffolds. We test SCARPA on real datasets in addition to a simulated diploid genome and compare its performance with several state-of-the-art scaffolders. We show that SCARPA produces longer or similar length scaffolds that are highly accurate compared with other scaffolders. SCARPA is also capable of detecting misassembled contigs and reports them during scaffolding.

Availability: SCARPA is open source and available from http://compbio.cs.toronto.edu/scarpa.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Ascomycota / genetics
  • Contig Mapping / methods*
  • Escherichia coli / genetics
  • Genomics / methods
  • High-Throughput Nucleotide Sequencing*
  • Programming, Linear
  • Software