Tourmaline: A containerized workflow for rapid and iterable amplicon sequence analysis using QIIME 2 and Snakemake
- PMID: 35902092
- PMCID: PMC9334028
- DOI: 10.1093/gigascience/giac066
Tourmaline: A containerized workflow for rapid and iterable amplicon sequence analysis using QIIME 2 and Snakemake
Abstract
Background: Amplicon sequencing (metabarcoding) is a common method to survey diversity of environmental communities whereby a single genetic locus is amplified and sequenced from the DNA of whole or partial organisms, organismal traces (e.g., skin, mucus, feces), or microbes in an environmental sample. Several software packages exist for analyzing amplicon data, among which QIIME 2 has emerged as a popular option because of its broad functionality, plugin architecture, provenance tracking, and interactive visualizations. However, each new analysis requires the user to keep track of input and output file names, parameters, and commands; this lack of automation and standardization is inefficient and creates barriers to meta-analysis and sharing of results.
Findings: We developed Tourmaline, a Python-based workflow that implements QIIME 2 and is built using the Snakemake workflow management system. Starting from a configuration file that defines parameters and input files-a reference database, a sample metadata file, and a manifest or archive of FASTQ sequences-it uses QIIME 2 to run either the DADA2 or Deblur denoising algorithm; assigns taxonomy to the resulting representative sequences; performs analyses of taxonomic, alpha, and beta diversity; and generates an HTML report summarizing and linking to the output files. Features include support for multiple cores, automatic determination of trimming parameters using quality scores, representative sequence filtering (taxonomy, length, abundance, prevalence, or ID), support for multiple taxonomic classification and sequence alignment methods, outlier detection, and automated initialization of a new analysis using previous settings. The workflow runs natively on Linux and macOS or via a Docker container. We ran Tourmaline on a 16S ribosomal RNA amplicon data set from Lake Erie surface water, showing its utility for parameter optimization and the ability to easily view interactive visualizations through the HTML report, QIIME 2 viewer, and R- and Python-based Jupyter notebooks.
Conclusion: Automated workflows like Tourmaline enable rapid analysis of environmental amplicon data, decreasing the time from data generation to actionable results. Tourmaline is available for download at github.com/aomlomics/tourmaline.
Keywords: amplicon sequencing; eDNA; environmental DNA; meta-analysis; metabarcoding; microbiome.
© The Author(s) 2022. Published by Oxford University Press GigaScience.
Figures
Similar articles
-
Natrix: a Snakemake-based workflow for processing, clustering, and taxonomically assigning amplicon sequencing reads.BMC Bioinformatics. 2020 Nov 16;21(1):526. doi: 10.1186/s12859-020-03852-4. BMC Bioinformatics. 2020. PMID: 33198651 Free PMC article.
-
Dadaist2: A Toolkit to Automate and Simplify Statistical Analysis and Plotting of Metabarcoding Experiments.Int J Mol Sci. 2021 May 18;22(10):5309. doi: 10.3390/ijms22105309. Int J Mol Sci. 2021. PMID: 34069990 Free PMC article.
-
Dadasnake, a Snakemake implementation of DADA2 to process amplicon sequencing data for microbial ecology.Gigascience. 2020 Nov 30;9(12):giaa135. doi: 10.1093/gigascience/giaa135. Gigascience. 2020. PMID: 33252655 Free PMC article.
-
ILIAD: a suite of automated Snakemake workflows for processing genomic data for downstream applications.BMC Bioinformatics. 2023 Nov 8;24(1):424. doi: 10.1186/s12859-023-05548-x. BMC Bioinformatics. 2023. PMID: 37940870 Free PMC article.
-
Sample data processing in an additive and reproducible taxonomic workflow by using character data persistently linked to preserved individual specimens.Database (Oxford). 2015 Sep 30;2015:bav094. doi: 10.1093/database/bav094. Print 2015. Database (Oxford). 2015. PMID: 26424081 Free PMC article. Review.
Cited by
-
A pile of pipelines: An overview of the bioinformatics software for metabarcoding data analyses.Mol Ecol Resour. 2023 Aug 7:10.1111/1755-0998.13847. doi: 10.1111/1755-0998.13847. Online ahead of print. Mol Ecol Resour. 2023. PMID: 37548515
References
-
- Deiner K, Bik HM, Mächler E, et al. . Environmental DNA metabarcoding: Transforming how we survey animal and plant communities. Mol Ecol. 2017;26:5872–95. - PubMed
-
- Compson ZG, McClenaghan B, Singer GAC, et al. . Metabarcoding from microbes to mammals: comprehensive bioassessment on a global scale. Front Ecol Evol. 2020;8:581835.
-
- Ruppert KM, Kline RJ, Rahman MS.. Past, present, and future perspectives of environmental DNA (eDNA) metabarcoding: A systematic review in methods, monitoring, and applications of global eDNA. Global Ecol Conserv. 2019;17:e00547.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Miscellaneous
