Accurate taxonomy assignments from 16S rRNA sequences produced by highly parallel pyrosequencers

Nucleic Acids Res. 2008 Oct;36(18):e120. doi: 10.1093/nar/gkn491. Epub 2008 Aug 22.


The recent introduction of massively parallel pyrosequencers allows rapid, inexpensive analysis of microbial community composition using 16S ribosomal RNA (rRNA) sequences. However, a major challenge is to design a workflow so that taxonomic information can be accurately and rapidly assigned to each read, so that the composition of each community can be linked back to likely ecological roles played by members of each species, genus, family or phylum. Here, we use three large 16S rRNA datasets to test whether taxonomic information based on the full-length sequences can be recaptured by short reads that simulate the pyrosequencer outputs. We find that different taxonomic assignment methods vary radically in their ability to recapture the taxonomic information in full-length 16S rRNA sequences: most methods are sensitive to the region of the 16S rRNA gene that is targeted for sequencing, but many combinations of methods and rRNA regions produce consistent and accurate results. To process large datasets of partial 16S rRNA sequences obtained from surveys of various microbial communities, including those from human body habitats, we recommend the use of Greengenes or RDP classifier with fragments of at least 250 bases, starting from one of the primers R357, R534, R798, F343 or F517.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Animals
  • Bacteria / classification*
  • Bacteria / genetics
  • Classification / methods
  • Humans
  • Metagenome
  • Mice
  • Phylogeny*
  • RNA, Ribosomal, 16S / genetics*
  • Reproducibility of Results
  • Sequence Alignment
  • Sequence Analysis, DNA*


  • RNA, Ribosomal, 16S