Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Dec 29;16:294.
doi: 10.1186/s13059-015-0849-0.

Circlator: Automated Circularization of Genome Assemblies Using Long Sequencing Reads

Affiliations
Free PMC article

Circlator: Automated Circularization of Genome Assemblies Using Long Sequencing Reads

Martin Hunt et al. Genome Biol. .
Free PMC article

Abstract

The assembly of DNA sequence data is undergoing a renaissance thanks to emerging technologies capable of producing reads tens of kilobases long. Assembling complete bacterial and small eukaryotic genomes is now possible, but the final step of circularizing sequences remains unsolved. Here we present Circlator, the first tool to automate assembly circularization and produce accurate linear representations of circular sequences. Using Pacific Biosciences and Oxford Nanopore data, Circlator correctly circularized 26 of 27 circularizable sequences, comprising 11 chromosomes and 12 plasmids from bacteria, the apicoplast and mitochondrion of Plasmodium falciparum and a human mitochondrion. Circlator is available at http://sanger-pathogens.github.io/circlator/ .

Figures

Fig. 1
Fig. 1
Typical issues in contigs produced by long-read assemblers representing circular sequences. In each example, the assembly is in a single contig, colored with a mix of green and blue, and the reference is shown in gray. Matches between the reference and assembly are shown in light blue. The plot below each reference sequence shows the number of matches to the assembly at each position of the reference sequence. a The contig has low-quality ends representing the same sequence, which needs resolving into one sequence. b The contig has missing sequence. c A small circular sequence is assembled into multiple tandem copies
Fig. 2
Fig. 2
Comparison of HGAP assembly of P. falciparum apicoplast and Circlator output. The HGAP and Circlator assemblies are shown in gray and white, respectively, with the numbers showing the lengths in kilobases. Nucmer matches between the genomes are shown as blue (hits in the same orientation) and pink (hits in opposing orientations). Matches to the three apicoplast genes, cox1 (blue), cox3 (green), and cob (orange), are shown as a colored track inside the assemblies. The corrected reads mapped to each of the assemblies are shown in gray outside the assemblies. This figure was generated using Circos [31]
Fig. 3
Fig. 3
Key stages of the Circlator pipeline. a Before circularization, input contigs are merged using de novo assemblies of filtered reads. b Circular contigs are resolved using matches to contigs assembled from filtered reads. c Circularized contigs are rearranged to start at the dnaA gene, or a different gene specified by the user

Similar articles

See all similar articles

Cited by 223 articles

See all "Cited by" articles

References

    1. Staden R. A strategy of DNA sequencing employing computer programs. Nucleic Acids Res. 1979;6(7):2601–10. doi: 10.1093/nar/6.7.2601. - DOI - PMC - PubMed
    1. Koren S, Phillippy AM. One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly. Curr Opin Microbiol. 2015;23:110–20. doi: 10.1016/j.mib.2014.11.014. - DOI - PubMed
    1. Chin CS, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C, et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods. 2013;10(6):563–9. doi: 10.1038/nmeth.2474. - DOI - PubMed
    1. Berlin K, Koren S, Chin CS, Drake JP, Landolin JM, Phillippy AM. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat Biotechnol. 2015;33(6):623–30. doi: 10.1038/nbt.3238. - DOI - PubMed
    1. SPRAI: Single pass read accuracy improver. http://zombie.cb.k.u-tokyo.ac.jp/sprai/index.html. Accessed 19 Nov 2014.

Publication types

LinkOut - more resources

Feedback