Purpose: The reference databases play a pivotal role in amplicon microbiome research, however these databases differ in the sequence content and taxonomic information available. Studies on mock community and human health microbiome have revealed the problems associated with the choice of reference database on the outcome. Nonetheless, the influence of reference databases in environmental microbiome studies is not explicitly illustrated.
Methods: This study analyzed the amplicon (V1V3, V3V4, V4V5 and V6V8) data of 128 soil samples and evaluated the impact of 16S rRNA databases, Genome Taxonomy Database (GTDB), Ribosomal Database Project (RDP), SILVA and Consensus Taxonomy (ConTax), on microbiome inference.
Results: The analyses showed that the distribution of observed amplicon sequence variants was significantly different (P-value < 2.647e-12) across four datasets, generated using different databases for each amplicon region. In addition, the beta diversity was also found to be altered by different databases. Further investigation revealed that the microbiome composition inferred by various databases differ significantly (P-value = 0.001), irrespective of amplicon regions. This study, found that the core-microbiome structure in environmental studies is influenced by the type of reference database used.
Conclusion: In summary, this present study illustrates that the choice of reference database could influence the outcome of environmental microbiome research.
Keywords: 16S rRNA; Amplicon microbiome; Core microbiome; Environmental microorganisms; Reference database; Taxonomy inference.
© 2022. The Author(s), under exclusive licence to Springer Nature B.V.