Microbiome Datasets Are Compositional: And This Is Not Optional
- PMID: 29187837
- PMCID: PMC5695134
- DOI: 10.3389/fmicb.2017.02224
Microbiome Datasets Are Compositional: And This Is Not Optional
Abstract
Datasets collected by high-throughput sequencing (HTS) of 16S rRNA gene amplimers, metagenomes or metatranscriptomes are commonplace and being used to study human disease states, ecological differences between sites, and the built environment. There is increasing awareness that microbiome datasets generated by HTS are compositional because they have an arbitrary total imposed by the instrument. However, many investigators are either unaware of this or assume specific properties of the compositional data. The purpose of this review is to alert investigators to the dangers inherent in ignoring the compositional nature of the data, and point out that HTS datasets derived from microbiome studies can and should be treated as compositions at all stages of analysis. We briefly introduce compositional data, illustrate the pathologies that occur when compositional data are analyzed inappropriately, and finally give guidance and point to resources and examples for the analysis of microbiome datasets using compositional data analysis.
Keywords: Bayesian estimation; compositional data; correlation; count normalization; high-throughput sequencing; microbiota; relative abundance.
Figures
Similar articles
-
Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis.Microbiome. 2014 May 5;2:15. doi: 10.1186/2049-2618-2-15. eCollection 2014. Microbiome. 2014. PMID: 24910773 Free PMC article.
-
It's all relative: analyzing microbiome data as compositions.Ann Epidemiol. 2016 May;26(5):322-9. doi: 10.1016/j.annepidem.2016.03.003. Epub 2016 Apr 2. Ann Epidemiol. 2016. PMID: 27143475 Review.
-
Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome studies.Microbiome. 2016 Nov 25;4(1):62. doi: 10.1186/s40168-016-0208-8. Microbiome. 2016. PMID: 27884206 Free PMC article.
-
A distance based multisample test for high-dimensional compositional data with applications to the human microbiome.BMC Bioinformatics. 2020 Dec 3;21(Suppl 9):205. doi: 10.1186/s12859-020-3530-x. BMC Bioinformatics. 2020. PMID: 33272203 Free PMC article.
-
Compositional analysis: a valid approach to analyze microbiome high-throughput sequencing data.Can J Microbiol. 2016 Aug;62(8):692-703. doi: 10.1139/cjm-2015-0821. Epub 2016 Apr 12. Can J Microbiol. 2016. PMID: 27314511 Review.
Cited by
-
The eukaryome of modern microbialites reveals distinct colonization across aquatic ecosystems.NPJ Biofilms Microbiomes. 2024 Sep 3;10(1):78. doi: 10.1038/s41522-024-00547-z. NPJ Biofilms Microbiomes. 2024. PMID: 39227595 Free PMC article.
-
Ocean Acidification Induces Changes in Virus-Host Relationships in Mediterranean Benthic Ecosystems.Microorganisms. 2021 Apr 6;9(4):769. doi: 10.3390/microorganisms9040769. Microorganisms. 2021. PMID: 33917639 Free PMC article.
-
Smartphones as an Ecological Niche of Microorganisms: Microbial Activities, Assembly, and Opportunistic Pathogens.Microbiol Spectr. 2022 Oct 26;10(5):e0150822. doi: 10.1128/spectrum.01508-22. Epub 2022 Aug 30. Microbiol Spectr. 2022. PMID: 36040152 Free PMC article.
-
Mining Synergistic Microbial Interactions: A Roadmap on How to Integrate Multi-Omics Data.Microorganisms. 2021 Apr 14;9(4):840. doi: 10.3390/microorganisms9040840. Microorganisms. 2021. PMID: 33920040 Free PMC article. Review.
-
Microbial community dynamics and cycling of plutonium and iron in a seasonally stratified and radiologically contaminated pond.Sci Rep. 2023 Nov 11;13(1):19697. doi: 10.1038/s41598-023-45182-4. Sci Rep. 2023. PMID: 37952079 Free PMC article.
References
-
- Aitchison J. (1983). Principal component analysis of compositional data. Biometrika 70, 57–65. 10.1093/biomet/70.1.57 - DOI
-
- Aitchison J. (1986). The Statistical Analysis of Compositional Data. London: Chapman and Hall.
-
- Aitchison J., Barceló-Vidal C., Martín-Fernández J. A., Pawlowsky-Glahn V. (2000). Logratio analysis and compositional distance. Math. Geol. 32, 271–275. 10.1023/A:1007529726302 - DOI
-
- Aitchison J., Greenacre M. (2002). Biplots of compositional data. J. Roy. Stat. Soc. Ser. C 51, 375–392. 10.1111/1467-9876.00275 - DOI
Publication types
LinkOut - more resources
Full Text Sources
Other Literature Sources
