Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jun 15;30(12):i302-9.
doi: 10.1093/bioinformatics/btu280.

Ragout-a Reference-Assisted Assembly Tool for Bacterial Genomes

Affiliations
Free PMC article

Ragout-a Reference-Assisted Assembly Tool for Bacterial Genomes

Mikhail Kolmogorov et al. Bioinformatics. .
Free PMC article

Abstract

Summary: Bacterial genomes are simpler than mammalian ones, and yet assembling the former from the data currently generated by high-throughput short-read sequencing machines still results in hundreds of contigs. To improve assembly quality, recent studies have utilized longer Pacific Biosciences (PacBio) reads or jumping libraries to connect contigs into larger scaffolds or help assemblers resolve ambiguities in repetitive regions of the genome. However, their popularity in contemporary genomic research is still limited by high cost and error rates. In this work, we explore the possibility of improving assemblies by using complete genomes from closely related species/strains. We present Ragout, a genome rearrangement approach, to address this problem. In contrast with most reference-guided algorithms, where only one reference genome is used, Ragout uses multiple references along with the evolutionary relationship among these references in order to determine the correct order of the contigs. Additionally, Ragout uses the assembly graph and multi-scale synteny blocks to reduce assembly gaps caused by small contigs from the input assembly. In simulations as well as real datasets, we believe that for common bacterial species, where many complete genome sequences from related strains have been available, the current high-throughput short-read sequencing paradigm is sufficient to obtain a single high-quality scaffold for each chromosome.

Availability: The Ragout software is freely available at: https://github.com/fenderglass/Ragout.

Figures

Fig. 1.
Fig. 1.
(a) A breakpoint graph of three reference genomes and one assembly. The three reference genomes (Ref1, Ref2 and Ref3) are presented as cyclic permutations of synteny blocks: Ref1(blue): + 1 +2 +3 +4 +5, Ref2(green): + 1 +3 +4 +5 and Ref3(orange): + 1 − 4 – 3 + 5, respectively. The target assembly (red) is presented as four separated permutations (corresponds to four contigs): Target Assembly: +1| +2 +3| +4|+5. (b) A phylogenetic tree representing the states of the half-breakpoint 5h. Each leaf is labeled by the state of the half-breakpoint 5h in the corresponding reference/target genome. (According to this tree, the state of 5h in the target genome is 4t, although the correct state of 5h in the target genome is unknown.)
Fig. 2.
Fig. 2.
Merging two scaffolds As and Aw built from two different synteny scales into a scaffold M. Yellow rectangles represent weak contigs. (a) As and Aw are consistent. (b) As and Aw are not consistent
Fig. 3.
Fig. 3.
Refinement with the assembly graph. The procedure fills scaffold gaps with small contigs. The big circles illustrate contigs with known order, while small ones correspond to contigs that were not considered in rearrangement analysis
Fig. 4.
Fig. 4.
Phylogenetic trees. (a) Heclicobacter Pylori with SJM180 as target. (b) Vibrio Cholerae with H1 as target. (c) Staphylococcus Aureus with USA300 as target. (d) Simulated genomes. Solid branches contain all types of rearrangements, while dashed branches contain only indels
Fig. 5.
Fig. 5.
(a−d) Dot plots of H.Pylori references versus target genomes. (e) Dot plot of Ragout’s scaffold versus the target genome showing a perfect diagonal line for visual verification
Fig. 6.
Fig. 6.
Dot plots of different chromosomes of V.Cholerae references (a–c) versus the corresponding chromosomes of the target genome showing rearrangements
Fig. 7.
Fig. 7.
Correspondence of the number of available references with the number of misordered contigs for Ragout

Similar articles

See all similar articles

Cited by 55 articles

See all "Cited by" articles

References

    1. Alekseyev MA, Pevzner PA. Breakpoint graphs and ancestral genome reconstructions. Genome Res. 2009;19:943–957. - PMC - PubMed
    1. Bankevich A, et al. Spades: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 2012;19:455–477. - PMC - PubMed
    1. Bashir A, et al. A hybrid approach for the automated finishing of bacterial genomes. Nat. Biotechnol. 2012;30:701–707. - PMC - PubMed
    1. Bashir K, et al. A hybrid approach for the automated finishing of bacterial genomes. Nat. Biotechnol. 2012;30:701–709. - PMC - PubMed
    1. Bergeron A, et al. Proceedings of Algorithms in Bioinformatics. Springer; 2006. A unifying view of genome rearrangements; pp. 163–173.

Publication types

Feedback