Comparing bacterial communities inferred from 16S rRNA gene sequencing and shotgun metagenomics

Pac Symp Biocomput. 2011;165-76. doi: 10.1142/9789814335058_0018.


16S rRNA gene sequencing has been widely used for probing the species structure of a variety of environmental bacterial communities. Alternatively, 16S rRNA gene fragments can be retrieved from shotgun metagenomic sequences (metagenomes) and used for species profiling. Both approaches have their limitations-16S rRNA sequencing may be biased because of unequal amplification of species' 16S rRNA genes, whereas shotgun metagenomic sequencing may not be deep enough to detect the 16S rRNA genes of rare species in a community. However, previous studies showed that these two approaches give largely similar species profiles for a few bacterial communities. To investigate this problem in greater detail, we conducted a systematic comparison of these two approaches. We developed PHYLOSHOP, a pipeline that predicts 16S rRNA gene fragments in metagenomes, reports the taxonomic assignment of these fragments, and visualizes their taxonomy distribution. Using PHYLOSHOP, we analyzed 33 metagenomic datasets of human-associated bacterial communities, and compared the bacterial community structures derived from these metagenomic datasets with the community structure derived from 16S rRNA gene sequencing (71 datasets). Based on several statistical tests (including a statistical test proposed here that takes into consideration differences in sample size), we observed that these two approaches give significantly different community structures for nearly all the bacterial communities collected from different locations on and in human body, and that these differences cannot be be explained by differences in sample size and are likely to be attributed by experimental method.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural

MeSH terms

  • Bacteria / classification
  • Bacteria / genetics
  • Computational Biology
  • Databases, Nucleic Acid / statistics & numerical data
  • Genes, Bacterial*
  • Genes, rRNA*
  • Humans
  • Metagenome*
  • Metagenomics / statistics & numerical data*
  • Phylogeny
  • RNA, Bacterial / genetics
  • RNA, Ribosomal, 16S / genetics
  • Sequence Analysis, RNA / statistics & numerical data
  • Software


  • RNA, Bacterial
  • RNA, Ribosomal, 16S