Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 5 (9)

Comparison of Long-Read Sequencing Technologies in the Hybrid Assembly of Complex Bacterial Genomes


Comparison of Long-Read Sequencing Technologies in the Hybrid Assembly of Complex Bacterial Genomes

Nicola De Maio et al. Microb Genom.


Illumina sequencing allows rapid, cheap and accurate whole genome bacterial analyses, but short reads (<300 bp) do not usually enable complete genome assembly. Long-read sequencing greatly assists with resolving complex bacterial genomes, particularly when combined with short-read Illumina data (hybrid assembly). However, it is not clear how different long-read sequencing methods affect hybrid assembly accuracy. Relative automation of the assembly process is also crucial to facilitating high-throughput complete bacterial genome reconstruction, avoiding multiple bespoke filtering and data manipulation steps. In this study, we compared hybrid assemblies for 20 bacterial isolates, including two reference strains, using Illumina sequencing and long reads from either Oxford Nanopore Technologies (ONT) or SMRT Pacific Biosciences (PacBio) sequencing platforms. We chose isolates from the family Enterobacteriaceae, as these frequently have highly plastic, repetitive genetic structures, and complete genome reconstruction for these species is relevant for a precise understanding of the epidemiology of antimicrobial resistance. We de novo assembled genomes using the hybrid assembler Unicycler and compared different read processing strategies, as well as comparing to long-read-only assembly with Flye followed by short-read polishing with Pilon. Hybrid assembly with either PacBio or ONT reads facilitated high-quality genome reconstruction, and was superior to the long-read assembly and polishing approach evaluated with respect to accuracy and completeness. Combining ONT and Illumina reads fully resolved most genomes without additional manual steps, and at a lower consumables cost per isolate in our setting. Automated hybrid assembly is a powerful tool for complete and accurate bacterial genome assembly.

Keywords: Enterobacteriaceae; bacterial genomics; hybrid assembly; long-read sequencing; plasmid assembly.

Conflict of interest statement

The authors declare that there are no conflicts of interest.


Fig. 1.
Fig. 1.
Examples of genome structure uncertainty in hybrid assemblies in (a) the chromosome and (b) the accessory genome. (a) An ONT+Illumina hybrid assembly for isolate RBHSTW-00029 using the ‘Basic’ long-read preparation strategy. (b) A PacBio+Illumina hybrid assembly for isolate MGH78578 using the ‘Corrected’ long-read preparation strategy. Plots were obtained using Bandage on the ‘assembly.gfa’ output file from Unicycler, with grey boxes indicating unresolved structures. Each contig is annotated with contig length and Illumina coverage; connections between contigs represent overlaps between contig ends. The assembly for RHBSTW-00029 in (a) and that of isolate RHB14-C01 (which showed a similar pattern of chromosome structure uncertainty) represented the only two datasets that could not be completely assembled with any of the attempted strategies using ONT+Illumina data. They were also not fully assembled by any PacBio+Illumina strategy, which similarly failed to completely assemble isolates RBHSTW-00189, RBHSTW-00277, RBHSTW-340 and CFT073 (Figure S4). The pattern in (b) was only observed for PacBio+Illumina data, and was the reason for incomplete assemblies for isolates RBHSTW-00123, RBHSTW-00131, RBHSTW-00142, RBHSTW-00167 and MGH78578 (Figure S5).
Fig. 2.
Fig. 2.
Examples of mismatches identified between the ONT-based and the PacBio-based assemblies for the two reference strains (E. coli CFT073 and K. pneumoniae MGH78578). Each sub-figure is an IGV (v2.4.3) view of part of the PacBio-based assembly, centred around a PacBio-ONT SNP, with all reads from the same isolate mapped to it. We performed this analysis for all SNPs in isolates MGH78578 and CFT073, and report examples for the two most typical patterns observed. (a) SNP from MGH78578 with very low Illumina coverage, but normal PacBio and ONT coverage. Most of the Illumina reads have a different base than the one in the PacBio-assembled reference (the red T's), suggesting perhaps an error in the PacBio assembly. A similar pattern is observed in 14 SNPs in CFT073 (with 12 due to error in the PacBio assembly), and 11 SNPs in MGH78578 (with 10 due to error in the PacBio assembly). (b) SNP from MGH78578 with normal Illumina coverage; Illumina reads support both bases with similar proportions, suggesting that this could be a polymorphic site within the original DNA sample. This pattern was observed for four SNPs in CFT073 and for 13 SNPs in MGH78578.

Similar articles

See all similar articles

Cited by 8 articles

See all "Cited by" articles


    1. Didelot X, Bowden R, Wilson DJ, Peto TEA, Crook DW. Transforming clinical microbiology with bacterial genome sequencing. Nat Rev Genet. 2012;13:601–612. doi: 10.1038/nrg3226. - DOI - PMC - PubMed
    1. Bradley P, Gordon NC, Walker TM, Dunn L, Heys S, et al. Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis. Nat Commun. 2015;6:10063 doi: 10.1038/ncomms10063. - DOI - PMC - PubMed
    1. Didelot X, Walker AS, Peto TE, Crook DW, Wilson DJ. Within-Host evolution of bacterial pathogens. Nat Rev Microbiol. 2016;14:150–162. doi: 10.1038/nrmicro.2015.13. - DOI - PMC - PubMed
    1. Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 2010;464:59–65. doi: 10.1038/nature08821. - DOI - PMC - PubMed
    1. George S, Pankhurst L, Hubbard A, Votintseva A, Stoesser N, et al. Resolving plasmid structures in Enterobacteriaceae using the MinION nanopore sequencer: assessment of MinION and MinION/Illumina hybrid data assembly approaches. Microb Genom. 2017;3:e000118 doi: 10.1099/mgen.0.000118. - DOI - PMC - PubMed

Publication types

LinkOut - more resources