CSAR: a contig scaffolding tool using algebraic rearrangements

Bioinformatics. 2018 Jan 1;34(1):109-111. doi: 10.1093/bioinformatics/btx543.

Abstract

Summary: Advances in next generation sequencing have generated massive amounts of short reads. However, assembling genome sequences from short reads still remains a challenging task. Due to errors in reads and large repeats in the genome, many of current assembly tools usually produce just collections of contigs whose relative positions and orientations along the genome being sequenced are still unknown. To address this issue, a scaffolding process to order and orient the contigs of a draft genome is needed for completing the genome sequence. In this work, we propose a new scaffolding tool called CSAR that can efficiently and more accurately order and orient the contigs of a given draft genome based on a reference genome of a related organism. In particular, the reference genome required by CSAR is not necessary to be complete in sequence. Our experimental results on real datasets have shown that CSAR outperforms other similar tools such as Projector2, OSLay and Mauve Aligner in terms of average sensitivity, precision, F-score, genome coverage, NGA50 and running time.

Availability and implementation: The program of CSAR can be downloaded from https://github.com/ablab-nthu/CSAR.

Contact: hchiu@mail.ncku.edu.tw or cllu@cs.nthu.edu.tw.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Bacteria / genetics
  • Contig Mapping / methods*
  • Genome
  • Genomics / methods
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Sequence Analysis, DNA / methods*
  • Software*