Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jul;6(7):1440-4.
doi: 10.1038/ismej.2011.208. Epub 2012 Jan 12.

Selection of Primers for Optimal Taxonomic Classification of Environmental 16S rRNA Gene Sequences

Free PMC article

Selection of Primers for Optimal Taxonomic Classification of Environmental 16S rRNA Gene Sequences

David A W Soergel et al. ISME J. .
Free PMC article


Microbial community profiling using 16S rRNA gene sequences requires accurate taxonomy assignments. 'Universal' primers target conserved sequences and amplify sequences from many taxa, but they provide variable coverage of different environments, and regions of the rRNA gene differ in taxonomic informativeness--especially when high-throughput short-read sequencing technologies (for example, 454 and Illumina) are used. We introduce a new evaluation procedure that provides an improved measure of expected taxonomic precision when classifying environmental sequence reads from a given primer. Applying this measure to thousands of combinations of primers and read lengths, simulating single-ended and paired-end sequencing, reveals that these choices greatly affect taxonomic informativeness. The most informative sequence region may differ by environment, partly due to variable coverage of different environments in reference databases. Using our Rtax method of classifying paired-end reads, we found that paired-end sequencing provides substantial benefit in some environments including human gut, but not in others. Optimal primer choice for short reads totaling 96 nt provides 82-100% of the confident genus classifications available from longer reads.


Figure 1
Figure 1
Classification performance, at three levels of estimated accuracy (Supplementary Methods), of 6617 possible choices of amplification primer, sequencing primer and read length for single-ended reads from different environments (left portion of each panel) and 3061 possible choices of primer pair and read length for paired-end reads (right portion). Combinations of primers and read lengths are sorted on the x axis according to a measure of overall classification performance (Supplementary Methods). Stacked bars show the proportion of non-chimeric, non-unique sequences from each sample—not the proportion of the total sample—that can be classified to each taxonomic level for each combination. See Supplementary Figure S1 and Supplementary Table S1 for the excluded proportion of novel (and thus a priori unclassifiable) sequences in each sample. The top of each colored section indicates how much of the sample can be classified to the given level or better. ‘Primer miss' (black) indicates sequences that did not match a given primer and so would not be amplified. Classifications more specific than the genus level are exceedingly rare and so are not visible here. Horizontal lines indicate the maximum proportion of each sample classifiable to the genus level using 96 nt or less of sequence (i.e., with an optimal choice of primer or primer pair; see also Supplementary Tables S4 and S5), showing that short reads from the best primers frequently—but not always—provide taxonomic information nearly matching that obtained from longer read lengths. Full-size versions of these panels are available in the supplementary data.

Similar articles

See all similar articles

Cited by 109 articles

See all "Cited by" articles


    1. Acinas SG, Klepac-Ceraj V, Hunt DE, Pharino C, Ceraj I, Distel DL, et al. Fine-scale phylogenetic architecture of a complex bacterial community. Nature. 2004;430:551–554. - PubMed
    1. Caporaso JG, Lauber CL, Walters WA, Berg-Lyons D, Lozupone CA, Turnbaugh PJ, et al. Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proc Natl Acad Sci USA. 2011;108 (Suppl 1:4516–4522. - PMC - PubMed
    1. Claesson MJ, Wang Q, O'Sullivan O, Greene-Diniz R, Cole JR, Ross RP, et al. Comparison of two next-generation sequencing technologies for resolving highly complex microbiota composition using tandem variable 16S rRNA gene regions. Nucleic Acids Res. 2010;38:e200. - PMC - PubMed
    1. Desantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, et al. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol. 2006;72:5069–5072. - PMC - PubMed
    1. Degnan PH, Ochman H. Illumina-based analysis of microbial community diversity. The ISME J. 2012;6:183–194. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources