Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 May 1;8(5):giz043.
doi: 10.1093/gigascience/giz043.

Ultra-deep, long-read nanopore sequencing of mock microbial community standards

Affiliations

Ultra-deep, long-read nanopore sequencing of mock microbial community standards

Samuel M Nicholls et al. Gigascience. .

Abstract

Background: Long sequencing reads are information-rich: aiding de novo assembly and reference mapping, and consequently have great potential for the study of microbial communities. However, the best approaches for analysis of long-read metagenomic data are unknown. Additionally, rigorous evaluation of bioinformatics tools is hindered by a lack of long-read data from validated samples with known composition.

Findings: We sequenced 2 commercially available mock communities containing 10 microbial species (ZymoBIOMICS Microbial Community Standards) with Oxford Nanopore GridION and PromethION. Both communities and the 10 individual species isolates were also sequenced with Illumina technology. We generated 14 and 16 gigabase pairs from 2 GridION flowcells and 150 and 153 gigabase pairs from 2 PromethION flowcells for the evenly distributed and log-distributed communities, respectively. Read length N50 ranged between 5.3 and 5.4 kilobase pairs over the 4 sequencing runs. Basecalls and corresponding signal data are made available (4.2 TB in total). Alignment to Illumina-sequenced isolates demonstrated the expected microbial species at anticipated abundances, with the limit of detection for the lowest abundance species below 50 cells (GridION). De novo assembly of metagenomes recovered long contiguous sequences without the need for pre-processing techniques such as binning.

Conclusions: We present ultra-deep, long-read nanopore datasets from a well-defined mock community. These datasets will be useful for those developing bioinformatics methods for long-read metagenomics and for the validation and comparison of current laboratory and software pipelines.

Keywords: de novo assembly; Illumina; benchmark; bioinformatics; metagenomics; mock community; nanopore; real-time sequencing; single-molecule sequencing.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Summary plots for the 4 generated data sets: (a) collector’s curve showing sequencing yield over time for each of the 4 sequencing runs, (b) density plot showing sequence accuracy (BLAST-like identities), (c) density plot showing sequencing speed over time by sequencing experiment.
Figure 2
Figure 2
Proportion of sequenced bases assigned by minimap2 to each of the 10 organisms that were sequenced (x-axis), against the proportion of yield expected given the known composition (y-axis) of the Zymo CSII (Log) standard.
Figure 3
Figure 3
Bar plots demonstrating total length and contiguity of genomic assemblies obtained with wtdbg2 from each of the long-read nanopore data sets. For each organism in the community (coloured columns), contigs longer than 10 kb are horizontally stacked along the x-axis. Each row represents a run of wtdbg2, with the parameters for edge support, read length threshold, and homopolymer-compressed k-mer size labelled on the left. Assemblies are grouped by the data set on which they were run (row facets). Additionally, assemblies may be compared to the estimated true genome size, the available McIntyre et al. [17] PacBio assemblies, and per-isolate Illumina SPAdes assembly. Estimated genomes sizes are the same as those found in Table 1; however, to display approximate chromosomes, the 2 yeasts were replaced by their corresponding canonical National Center for Biotechnology Information references for visualization purposes only. The C. neoformans strain used by the Zymo standards is a diploid genetic cross, which may explain the larger assemblies, compared to the represented estimated haploid size.

Similar articles

Cited by

References

    1. Handelsman J. Metagenomics: application of genomics to uncultured microorganisms. Microbiol Mol Biol Rev. 2004;68(4):669–85. - PMC - PubMed
    1. Hug LA, Baker BJ, Anantharaman K, et al. .. A new view of the tree of life. Nat Microbiol. 2016;1:16048. - PubMed
    1. Quince C, Walker AW, Simpson JT, et al. .. Shotgun metagenomics, from sampling to analysis. Nat Biotechnol. 2017;35(9):833–44. - PubMed
    1. Jain M, Koren S, Miga KH, et al. .. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat Biotechnol. 2018;36:338. - PMC - PubMed
    1. Payne A, Holmes N, Rakyan V, et al. .. BulkVis: a graphical viewer for Oxford nanopore bulk FAST5 files. Bioinformatics. 2018,doi:10.1093/bioinformatics/bty841. - DOI - PMC - PubMed

Publication types