The topological nature of tag jumping in environmental DNA metabarcoding studies

Mol Ecol Resour. 2023 Apr;23(3):621-631. doi: 10.1111/1755-0998.13745. Epub 2023 Jan 6.

Abstract

Metabarcoding of environmental DNA constitutes a state-of-the-art tool for environmental studies. One fundamental principle implicit in most metabarcoding studies is that individual sample amplicons can still be identified after being pooled with others-based on their unique combinations of tags-during the so-called demultiplexing step that follows sequencing. Nevertheless, it has been recognized that tags can sometimes be changed (i.e., tag jumping), which ultimately leads to sample crosstalk. Here, using four DNA metabarcoding data sets derived from the analysis of soils and sediments, we show that tag jumping follows very specific and systematic patterns. Specifically, we find a strong correlation between the number of reads in blank samples and their topological position in the tag matrix (described by vertical and horizontal vectors). This observed spatial pattern of artefactual sequences could be explained by polymerase activity, which leads to the exchange of the 3' tag of single stranded tagged sequences through the formation of heteroduplexes with mixed barcodes. Importantly, tag jumping substantially distorted our data sets-despite our use of methods suggested to minimize this error. We developed a topological model to estimate the noise based on the counts in our blanks, which suggested that 40%-80% of the taxa in our soil and sedimentary samples were likely false positives introduced through tag jumping. We highlight that the amount of false positive detections caused by tag jumping strongly biased our community analyses.

Keywords: a-DNA; detection limits; e-DNA; false positive; index hopping; sample crosstalk.

MeSH terms

  • DNA / genetics
  • DNA Barcoding, Taxonomic / methods
  • DNA, Environmental*
  • Sequence Analysis, DNA / methods

Substances

  • DNA, Environmental
  • DNA