Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 28:11:giac066.
doi: 10.1093/gigascience/giac066.

Tourmaline: A containerized workflow for rapid and iterable amplicon sequence analysis using QIIME 2 and Snakemake

Affiliations

Tourmaline: A containerized workflow for rapid and iterable amplicon sequence analysis using QIIME 2 and Snakemake

Luke R Thompson et al. Gigascience. .

Abstract

Background: Amplicon sequencing (metabarcoding) is a common method to survey diversity of environmental communities whereby a single genetic locus is amplified and sequenced from the DNA of whole or partial organisms, organismal traces (e.g., skin, mucus, feces), or microbes in an environmental sample. Several software packages exist for analyzing amplicon data, among which QIIME 2 has emerged as a popular option because of its broad functionality, plugin architecture, provenance tracking, and interactive visualizations. However, each new analysis requires the user to keep track of input and output file names, parameters, and commands; this lack of automation and standardization is inefficient and creates barriers to meta-analysis and sharing of results.

Findings: We developed Tourmaline, a Python-based workflow that implements QIIME 2 and is built using the Snakemake workflow management system. Starting from a configuration file that defines parameters and input files-a reference database, a sample metadata file, and a manifest or archive of FASTQ sequences-it uses QIIME 2 to run either the DADA2 or Deblur denoising algorithm; assigns taxonomy to the resulting representative sequences; performs analyses of taxonomic, alpha, and beta diversity; and generates an HTML report summarizing and linking to the output files. Features include support for multiple cores, automatic determination of trimming parameters using quality scores, representative sequence filtering (taxonomy, length, abundance, prevalence, or ID), support for multiple taxonomic classification and sequence alignment methods, outlier detection, and automated initialization of a new analysis using previous settings. The workflow runs natively on Linux and macOS or via a Docker container. We ran Tourmaline on a 16S ribosomal RNA amplicon data set from Lake Erie surface water, showing its utility for parameter optimization and the ability to easily view interactive visualizations through the HTML report, QIIME 2 viewer, and R- and Python-based Jupyter notebooks.

Conclusion: Automated workflows like Tourmaline enable rapid analysis of environmental amplicon data, decreasing the time from data generation to actionable results. Tourmaline is available for download at github.com/aomlomics/tourmaline.

Keywords: amplicon sequencing; eDNA; environmental DNA; meta-analysis; metabarcoding; microbiome.

PubMed Disclaimer

Figures

Figure 1
Figure 1
: The Tourmaline workflow. Install natively (macOS, Linux) or using a Docker container. Set up by cloning the Tourmaline repository (directory) from GitHub, initializing the directory from a previous run (optional), editing the configuration file (config.yaml, Supplementary Table S1), creating symbolic links to the reference database files, organizing the sequence files and/or editing the FASTQ manifest file, and editing and creating a symbolic link to the metadata file. Run by calling the Snakemake commands for denoise, taxonomy, diversity, and report—or running just the report command to generate all output if the parameters do not need to be changed between individual commands. It is recommended but not required to run the unfiltered commands before the filtered commands. The primary input and output files are listed. Detailed instructions for each step are provided in the Tourmaline Wiki [44].
Figure 2
Figure 2
: Step-by-step tutorial on Tourmaline using the provided test data, which are subsampled from the 16S rRNA amplicon data of a 2018 survey of Western Lake Erie. Key parameters in config.yaml and primary output for each command (pseudo-rule) are listed. Indicated output should be evaluated to determine the appropriate parameters for the next command. Evaluation of the primary outputs and rationale for parameter choice is shown for the test Lake Erie 16S rRNA data that come with the Tourmaline repository. See Supplementary Fig. S3 for screenshots of the primary output files.
Figure 3
Figure 3
: Example of the main outputs of the Tourmaline workflow beyond the QIIME 2 outputs. Contents in panels A, E, F, and G are truncated. Screenshots of additional output files are provided in Supplementary Fig. S3. See Fig. 2 for commands, parameters, and guidance.

Similar articles

Cited by

  • A pile of pipelines: An overview of the bioinformatics software for metabarcoding data analyses.
    Hakimzadeh A, Abdala Asbun A, Albanese D, Bernard M, Buchner D, Callahan B, Caporaso JG, Curd E, Djemiel C, Brandström Durling M, Elbrecht V, Gold Z, Gweon HS, Hajibabaei M, Hildebrand F, Mikryukov V, Normandeau E, Özkurt E, M Palmer J, Pascal G, Porter TM, Straub D, Vasar M, Větrovský T, Zafeiropoulos H, Anslan S. Hakimzadeh A, et al. Mol Ecol Resour. 2023 Aug 7:10.1111/1755-0998.13847. doi: 10.1111/1755-0998.13847. Online ahead of print. Mol Ecol Resour. 2023. PMID: 37548515

References

    1. The Human Microbiome Project Consortium . Structure, function and diversity of the healthy human microbiome. Nature. 2012;486:207–14. - PMC - PubMed
    1. Thompson LR, Sanders JG, McDonald D, et al. . A communal catalogue reveals Earth’s multiscale microbial diversity. Nature. 2017;551:457–63. - PMC - PubMed
    1. Deiner K, Bik HM, Mächler E, et al. . Environmental DNA metabarcoding: Transforming how we survey animal and plant communities. Mol Ecol. 2017;26:5872–95. - PubMed
    1. Compson ZG, McClenaghan B, Singer GAC, et al. . Metabarcoding from microbes to mammals: comprehensive bioassessment on a global scale. Front Ecol Evol. 2020;8:581835.
    1. Ruppert KM, Kline RJ, Rahman MS.. Past, present, and future perspectives of environmental DNA (eDNA) metabarcoding: A systematic review in methods, monitoring, and applications of global eDNA. Global Ecol Conserv. 2019;17:e00547.

Publication types