Temporal probabilistic modeling of bacterial compositions derived from 16S rRNA sequencing
- PMID: 28968799
- PMCID: PMC5860357
- DOI: 10.1093/bioinformatics/btx549
Temporal probabilistic modeling of bacterial compositions derived from 16S rRNA sequencing
Abstract
Motivation: The number of microbial and metagenomic studies has increased drastically due to advancements in next-generation sequencing-based measurement techniques. Statistical analysis and the validity of conclusions drawn from (time series) 16S rRNA and other metagenomic sequencing data is hampered by the presence of significant amount of noise and missing data (sampling zeros). Accounting uncertainty in microbiome data is often challenging due to the difficulty of obtaining biological replicates. Additionally, the compositional nature of current amplicon and metagenomic data differs from many other biological data types adding another challenge to the data analysis.
Results: To address these challenges in human microbiome research, we introduce a novel probabilistic approach to explicitly model overdispersion and sampling zeros by considering the temporal correlation between nearby time points using Gaussian Processes. The proposed Temporal Gaussian Process Model for Compositional Data Analysis (TGP-CODA) shows superior modeling performance compared to commonly used Dirichlet-multinomial, multinomial and non-parametric regression models on real and synthetic data. We demonstrate that the nonreplicative nature of human gut microbiota studies can be partially overcome by our method with proper experimental design of dense temporal sampling. We also show that different modeling approaches have a strong impact on ecological interpretation of the data, such as stationarity, persistence and environmental noise models.
Availability and implementation: A Stan implementation of the proposed method is available under MIT license at https://github.com/tare/GPMicrobiome.
Contact: taijo@flatironinstitute.org or rb113@nyu.edu.
Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author(s) 2017. Published by Oxford University Press.
Figures
Similar articles
-
An informative approach on differential abundance analysis for time-course metagenomic sequencing data.Bioinformatics. 2017 May 1;33(9):1286-1292. doi: 10.1093/bioinformatics/btw828. Bioinformatics. 2017. PMID: 28057680
-
CCLasso: correlation inference for compositional data through Lasso.Bioinformatics. 2015 Oct 1;31(19):3172-80. doi: 10.1093/bioinformatics/btv349. Epub 2015 Jun 4. Bioinformatics. 2015. PMID: 26048598 Free PMC article.
-
Host DNA depletion efficiency of microbiome DNA enrichment methods in infected tissue samples.J Microbiol Methods. 2020 Mar;170:105856. doi: 10.1016/j.mimet.2020.105856. Epub 2020 Jan 30. J Microbiol Methods. 2020. PMID: 32007505
-
Current challenges and best-practice protocols for microbiome analysis.Brief Bioinform. 2021 Jan 18;22(1):178-193. doi: 10.1093/bib/bbz155. Brief Bioinform. 2021. PMID: 31848574 Free PMC article. Review.
-
Compositional data analysis of the microbiome: fundamentals, tools, and challenges.Ann Epidemiol. 2016 May;26(5):330-5. doi: 10.1016/j.annepidem.2016.03.002. Epub 2016 Mar 31. Ann Epidemiol. 2016. PMID: 27255738 Review.
Cited by
-
Using Community Ecology Theory and Computational Microbiome Methods To Study Human Milk as a Biological System.mSystems. 2022 Feb 22;7(1):e0113221. doi: 10.1128/msystems.01132-21. Epub 2022 Feb 1. mSystems. 2022. PMID: 35103486 Free PMC article. Review.
-
A Generic Multivariate Framework for the Integration of Microbiome Longitudinal Studies With Other Data Types.Front Genet. 2019 Nov 7;10:963. doi: 10.3389/fgene.2019.00963. eCollection 2019. Front Genet. 2019. PMID: 31803221 Free PMC article.
-
Principles and challenges of modeling temporal and spatial omics data.Nat Methods. 2023 Oct;20(10):1462-1474. doi: 10.1038/s41592-023-01992-y. Epub 2023 Sep 14. Nat Methods. 2023. PMID: 37710019 Review.
-
Methodological Considerations in Longitudinal Analyses of Microbiome Data: A Comprehensive Review.Genes (Basel). 2023 Dec 28;15(1):0. doi: 10.3390/genes15010051. Genes (Basel). 2023. PMID: 38254941 Free PMC article. Review.
-
Microbiome time series data reveal predictable patterns of change.Microbiol Spectr. 2024 Oct 3;12(10):e0410923. doi: 10.1128/spectrum.04109-23. Epub 2024 Aug 20. Microbiol Spectr. 2024. PMID: 39162505 Free PMC article.
References
-
- Aach J., Church G.M. (2001) Aligning gene expression time series with time warping algorithms. Bioinformatics, 17, 495–508. - PubMed
-
- Aitchison J. (1982) The statistical analysis of compositional data. J. R. Stat. Soc. Ser. B (Methodological), 44, 139–177.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
