Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Aug 25;6(4):e0019121.
doi: 10.1128/mSphere.00191-21. Epub 2021 Jul 21.

Amplicon Sequence Variants Artificially Split Bacterial Genomes into Separate Clusters

Affiliations

Amplicon Sequence Variants Artificially Split Bacterial Genomes into Separate Clusters

Patrick D Schloss. mSphere. .

Abstract

Amplicon sequencing variants (ASVs) have been proposed as an alternative to operational taxonomic units (OTUs) for analyzing microbial communities. ASVs have grown in popularity, in part because of a desire to reflect a more refined level of taxonomy since they do not cluster sequences based on a distance-based threshold. However, ASVs and the use of overly narrow thresholds to identify OTUs increase the risk of splitting a single genome into separate clusters. To assess this risk, I analyzed the intragenomic variation of 16S rRNA genes from the bacterial genomes represented in an rrn copy number database, which contained 20,427 genomes from 5,972 species. As the number of copies of the 16S rRNA gene increased in a genome, the number of ASVs also increased. There was an average of 0.58 ASVs per copy of the 16S rRNA gene for full-length 16S rRNA genes. It was necessary to use a distance threshold of 5.25% to cluster full-length ASVs from the same genome into a single OTU with 95% confidence for genomes with 7 copies of the 16S rRNA, such as Escherichia coli. This research highlights the risk of splitting a single bacterial genome into separate clusters when ASVs are used to analyze 16S rRNA gene sequence data. Although there is also a risk of clustering ASVs from different species into the same OTU when using broad distance thresholds, these risks are of less concern than artificially splitting a genome into separate ASVs and OTUs. IMPORTANCE 16S rRNA gene sequencing has engendered significant interest in studying microbial communities. There has been tension between trying to classify 16S rRNA gene sequences to increasingly lower taxonomic levels and the reality that those levels were defined using more sequence and physiological information than is available from a fragment of the 16S rRNA gene. Furthermore, the naming of bacterial taxa reflects the biases of those who name them. One motivation for the recent push to adopt ASVs in place of OTUs in microbial community analyses is to allow researchers to perform their analyses at the finest possible level that reflects species-level taxonomy. The current research is significant because it quantifies the risk of artificially splitting bacterial genomes into separate clusters. Far from providing a better representation of bacterial taxonomy and biology, the ASV approach can lead to conflicting inferences about the ecology of different ASVs from the same genome.

Keywords: 16S rRNA gene; ASV; OTU; bioinformatics; microbial communities; microbial ecology; microbiome.

PubMed Disclaimer

Figures

FIG 1
FIG 1
The distance threshold required to prevent the splitting of genomes into multiple OTUs increased as the number of rrn operons in the genome increased. Each line represents the median distance threshold for each region of the 16S rRNA gene that is required for 95% of the genomes with the indicated number of rrn operons to cluster their ASVs to a single OTU. The median distance threshold was calculated across 100 randomizations in which one genome was sampled from each species. Only those numbers of rrn operons that were found in more than 100 species are included.
FIG 2
FIG 2
As the distance threshold used to define an OTU increased, the percentages of ASVs and OTUs representing multiple species increased. These data represent the median fractions for both measurements across 100 randomizations. In each randomization, one genome was sampled from each species.

Similar articles

Cited by

References

    1. Lane DJ, Pace B, Olsen GJ, Stahl DA, Sogin ML, Pace NR. 1985. Rapid determination of 16S ribosomal RNA sequences for phylogenetic analyses. Proc Natl Acad Sci U S A 82:6955–6959. doi:10.1073/pnas.82.20.6955. - DOI - PMC - PubMed
    1. Stackebrandt E, Goebel BM. 1994. Taxonomic note: a place for DNA-DNA reassociation and 16S rRNA sequence analysis in the present species definition in bacteriology. Int J Syst Evol Microbiol 44:846–849. doi:10.1099/00207713-44-4-846. - DOI
    1. Goris J, Konstantinidis KT, Klappenbach JA, Coenye T, Vandamme P, Tiedje JM. 2007. DNA-DNA hybridization values and their relationship to whole-genome sequence similarities. Int J Syst Evol Microbiol 57:81–91. doi:10.1099/ijs.0.64483-0. - DOI - PubMed
    1. Rodriguez-R LM, Castro JC, Kyrpides NC, Cole JR, Tiedje JM, Konstantinidis KT. 2018. How much do rRNA gene surveys underestimate extant bacterial diversity? Appl Environ Microbiol 84:e00014-18. doi:10.1128/AEM.00014-18. - DOI - PMC - PubMed
    1. Stackebrandt E, Ebers J. 2006. Taxonomic parameters revisited: tarnished gold standards. Microbiol Today 33:152–155.

Publication types

LinkOut - more resources