Nonpareil 3: Fast Estimation of Metagenomic Coverage and Sequence Diversity
- PMID: 29657970
- PMCID: PMC5893860
- DOI: 10.1128/mSystems.00039-18
Nonpareil 3: Fast Estimation of Metagenomic Coverage and Sequence Diversity
Abstract
Estimations of microbial community diversity based on metagenomic data sets are affected, often to an unknown degree, by biases derived from insufficient coverage and reference database-dependent estimations of diversity. For instance, the completeness of reference databases cannot be generally estimated since it depends on the extant diversity sampled to date, which, with the exception of a few habitats such as the human gut, remains severely undersampled. Further, estimation of the degree of coverage of a microbial community by a metagenomic data set is prohibitively time-consuming for large data sets, and coverage values may not be directly comparable between data sets obtained with different sequencing technologies. Here, we extend Nonpareil, a database-independent tool for the estimation of coverage in metagenomic data sets, to a high-performance computing implementation that scales up to hundreds of cores and includes, in addition, a k-mer-based estimation as sensitive as the original alignment-based version but about three hundred times as fast. Further, we propose a metric of sequence diversity (Nd ) derived directly from Nonpareil curves that correlates well with alpha diversity assessed by traditional metrics. We use this metric in different experiments demonstrating the correlation with the Shannon index estimated on 16S rRNA gene profiles and show that Nd additionally reveals seasonal patterns in marine samples that are not captured by the Shannon index and more precise rankings of the magnitude of diversity of microbial communities in different habitats. Therefore, the new version of Nonpareil, called Nonpareil 3, advances the toolbox for metagenomic analyses of microbiomes. IMPORTANCE Estimation of the coverage provided by a metagenomic data set, i.e., what fraction of the microbial community was sampled by DNA sequencing, represents an essential first step of every culture-independent genomic study that aims to robustly assess the sequence diversity present in a sample. However, estimation of coverage remains elusive because of several technical limitations associated with high computational requirements and limiting statistical approaches to quantify diversity. Here we described Nonpareil 3, a new bioinformatics algorithm that circumvents several of these limitations and thus can facilitate culture-independent studies in clinical or environmental settings, independent of the sequencing platform employed. In addition, we present a new metric of sequence diversity based on rarefied coverage and demonstrate its use in communities from diverse ecosystems.
Keywords: bioinformatics; coverage; metagenomics; microbial ecology.
Figures
Similar articles
-
Nonpareil: a redundancy-based approach to assess the level of coverage in metagenomic datasets.Bioinformatics. 2014 Mar 1;30(5):629-35. doi: 10.1093/bioinformatics/btt584. Epub 2013 Oct 11. Bioinformatics. 2014. PMID: 24123672
-
Fragmentation and Coverage Variation in Viral Metagenome Assemblies, and Their Effect in Diversity Calculations.Front Bioeng Biotechnol. 2015 Sep 17;3:141. doi: 10.3389/fbioe.2015.00141. eCollection 2015. Front Bioeng Biotechnol. 2015. PMID: 26442255 Free PMC article.
-
A user's guide to quantitative and comparative analysis of metagenomic datasets.Methods Enzymol. 2013;531:525-47. doi: 10.1016/B978-0-12-407863-5.00023-X. Methods Enzymol. 2013. PMID: 24060135
-
Practical considerations for sampling and data analysis in contemporary metagenomics-based environmental studies.J Microbiol Methods. 2018 Nov;154:14-18. doi: 10.1016/j.mimet.2018.09.020. Epub 2018 Oct 1. J Microbiol Methods. 2018. PMID: 30287354 Review.
-
The use of metagenomic approaches to analyze changes in microbial communities.Microbiol Insights. 2013 Apr 16;6:37-48. doi: 10.4137/MBI.S10819. eCollection 2013. Microbiol Insights. 2013. PMID: 24826073 Free PMC article. Review.
Cited by
-
BASALT refines binning from metagenomic data and increases resolution of genome-resolved metagenomic analysis.Nat Commun. 2024 Mar 11;15(1):2179. doi: 10.1038/s41467-024-46539-7. Nat Commun. 2024. PMID: 38467684 Free PMC article.
-
Sputum metagenomics of people with bronchiectasis.ERJ Open Res. 2024 Mar 4;10(2):01008-2023. doi: 10.1183/23120541.01008-2023. eCollection 2024 Mar. ERJ Open Res. 2024. PMID: 38444657 Free PMC article.
-
Biogeographic patterns and drivers of soil viromes.Nat Ecol Evol. 2024 Feb 21. doi: 10.1038/s41559-024-02347-2. Online ahead of print. Nat Ecol Evol. 2024. PMID: 38383853
-
Global biogeography and ecological implications of cobamide-producing prokaryotes.ISME J. 2024 Jan 8;18(1):wrae009. doi: 10.1093/ismejo/wrae009. ISME J. 2024. PMID: 38366262 Free PMC article.
-
Responses of soil micro-eukaryotic communities to decadal drainage in a Siberian wet tussock tundra.Front Microbiol. 2024 Jan 5;14:1227909. doi: 10.3389/fmicb.2023.1227909. eCollection 2023. Front Microbiol. 2024. PMID: 38249484 Free PMC article.
References
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous