Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 12 (7), 1889-98

Ironing Out the Wrinkles in the Rare Biosphere Through Improved OTU Clustering


Ironing Out the Wrinkles in the Rare Biosphere Through Improved OTU Clustering

Susan M Huse et al. Environ Microbiol.


Deep sequencing of PCR amplicon libraries facilitates the detection of low-abundance populations in environmental DNA surveys of complex microbial communities. At the same time, deep sequencing can lead to overestimates of microbial diversity through the generation of low-frequency, error-prone reads. Even with sequencing error rates below 0.005 per nucleotide position, the common method of generating operational taxonomic units (OTUs) by multiple sequence alignment and complete-linkage clustering significantly increases the number of predicted OTUs and inflates richness estimates. We show that a 2% single-linkage preclustering methodology followed by an average-linkage clustering based on pairwise alignments more accurately predicts expected OTUs in both single and pooled template preparations of known taxonomic composition. This new clustering method can reduce the OTU richness in environmental samples by as much as 30-60% but does not reduce the fraction of OTUs in long-tailed rank abundance curves that defines the rare biosphere.


Fig. 1
Fig. 1
Effect of clustering method on the number of OTUs. We created OTU clusters of the three known template preparations using combinations of multiple sequence and pairwise alignments, complete-linkage and average-linkage clustering, and single-linkage preclustering. Each method provides distinctly different numbers of OTUs for the same data. For short hypervariable tags sequenced at depth, the single-linkage preclustering using pairwise alignments, followed by an average linkage clustering (SLP / PW-AL) provides the most accurate results.
Fig. 2
Fig. 2
Number of additional OTUs as a function of sample depth. For the two genomic templates, E. coli and S. epidermidis, and the multiple template Clone-43 samples, we calculated the number of spurious OTUs as a function of sample depth.

Similar articles

See all similar articles

Cited by 496 PubMed Central articles

See all "Cited by" articles


    1. Altschul SF, Gish W, Miller W, Meyers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. - PubMed
    1. Brockman W, Alvarez P, Young S, Garber M, Giannoukos G, Lee WL, et al. Quality scores and SNP detection in sequencing-by-synthesis systems. Genome Res. 2008;18:763–770. - PMC - PubMed
    1. DeSantis TZ, Jr, Hugenholtz P, Keller K, Brodie EL, Larsen N, Piceno YM, et al. NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes. Nucleic Acids Res. 2006;34:W394–W399. - PMC - PubMed
    1. Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–1797. - PMC - PubMed
    1. Ewing B, Green P. Base-calling of automated sequencer traces using Phred. II. Error probabilities. Genome Res. 1998;8:186–194. - PubMed

Publication types