FGAP: an automated gap closing tool

BMC Res Notes. 2014 Jun 18;7:371. doi: 10.1186/1756-0500-7-371.

Abstract

Background: The fast reduction of prices of DNA sequencing allowed rapid accumulation of genome data. However, the process of obtaining complete genome sequences is still very time consuming and labor demanding. In addition, data produced from various sequencing technologies or alternative assemblies remain underexplored to improve assembly of incomplete genome sequences.

Findings: We have developed FGAP, a tool for closing gaps of draft genome sequences that takes advantage of different datasets. FGAP uses BLAST to align multiple contigs against a draft genome assembly aiming to find sequences that overlap gaps. The algorithm selects the best sequence to fill and eliminate the gap.

Conclusions: FGAP reduced the number of gaps by 78% in an E. coli draft genome assembly using two different sequencing technologies, Illumina and 454. Using PacBio long reads, 98% of gaps were solved. In human chromosome 14 assemblies, FGAP reduced the number of gaps by 35%. All the inserted sequences were validated with a reference genome using QUAST. The source code and a web tool are available at http://www.bioinfo.ufpr.br/fgap/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Base Sequence
  • Chromosomes, Human, Pair 14
  • Contig Mapping / methods*
  • Contig Mapping / statistics & numerical data
  • Escherichia coli / genetics*
  • Genome, Bacterial*
  • Genome, Human*
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Molecular Sequence Data
  • Software*