Optimizing de novo genome assembly from PCR-amplified metagenomes

PeerJ. 2019 May 9;7:e6902. doi: 10.7717/peerj.6902. eCollection 2019.


Background: Metagenomics has transformed our understanding of microbial diversity across ecosystems, with recent advances enabling de novo assembly of genomes from metagenomes. These metagenome-assembled genomes are critical to provide ecological, evolutionary, and metabolic context for all the microbes and viruses yet to be cultivated. Metagenomes can now be generated from nanogram to subnanogram amounts of DNA. However, these libraries require several rounds of PCR amplification before sequencing, and recent data suggest these typically yield smaller and more fragmented assemblies than regular metagenomes.

Methods: Here we evaluate de novo assembly methods of 169 PCR-amplified metagenomes, including 25 for which an unamplified counterpart is available, to optimize specific assembly approaches for PCR-amplified libraries. We first evaluated coverage bias by mapping reads from PCR-amplified metagenomes onto reference contigs obtained from unamplified metagenomes of the same samples. Then, we compared different assembly pipelines in terms of assembly size (number of bp in contigs ≥ 10 kb) and error rates to evaluate which are the best suited for PCR-amplified metagenomes.

Results: Read mapping analyses revealed that the depth of coverage within individual genomes is significantly more uneven in PCR-amplified datasets versus unamplified metagenomes, with regions of high depth of coverage enriched in short inserts. This enrichment scales with the number of PCR cycles performed, and is presumably due to preferential amplification of short inserts. Standard assembly pipelines are confounded by this type of coverage unevenness, so we evaluated other assembly options to mitigate these issues. We found that a pipeline combining read deduplication and an assembly algorithm originally designed to recover genomes from libraries generated after whole genome amplification (single-cell SPAdes) frequently improved assembly of contigs ≥10 kb by 10 to 100-fold for low input metagenomes.

Conclusions: PCR-amplified metagenomes have enabled scientists to explore communities traditionally challenging to describe, including some with extremely low biomass or from which DNA is particularly difficult to extract. Here we show that a modified assembly pipeline can lead to an improved de novo genome assembly from PCR-amplified datasets, and enables a better genome recovery from low input metagenomes.

Keywords: Genome assembly; Metagenomics; Microbial ecology; Viral metagenomics.

Grant support

Delaware Bay samples (Yuanchao Zhan, David Marsan, Feng Chen) were collected through a research cruise supported by a National Science Foundation grant (OCE-0825468). San Pedro Ocean Time series samples (Jed Fuhrman) were collected and processed through research support from National Science Foundation grants OCE-1031743, OCE-1136818 and OCE-1737409. Sampling and extraction of thawing permafrost soil samples from Stordalen Mire (Virginia I Rich, Gareth Trubl, Matthew B Sullivan), was funded by the Genomic Science Program of the United States Department of Energy Office of Biological and Environmental Research, grants DE-SC0004632, DE-SC0010580, and DE-SC0016440, which also supported Gareth Trubl. This work was supported by the U.S. Department of Energy, Office of Science, Office of Workforce Development for Teachers and Scientists, Office of Science Graduate Student Research (SCGSR) program. The SCGSR program is administered by the Oak Ridge Institute for Science and Education (ORISE) for the DOE. ORISE is managed by ORAU under contract number DE-SC0014664. This work was also supported by Gordon & Betty Moore Foundation grants 3790 and 5488 to Matthew B. Sullivan and grant 3779 to Jed Fuhrman, and the US Department of Energy Office of Science, Office of Biological and Environmental Research Early Career Program under contract number and DE-AC02-05CH11231 to TRN. The work conducted by the U.S. Department of Energy Joint Genome Institute is supported by the Office of Science of the U.S. Department of Energy under contract no. DE-AC02-05CH11231. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.