Background: The metagenomic analysis of microbial communities holds the potential to improve our understanding of the role of microbes in clinical conditions. Recent, dramatic improvements in DNA sequencing throughput and cost will enable such analyses on individuals. However, such advances in throughput generally come at the cost of shorter read-lengths, limiting the discriminatory power of each read. In particular, classifying the microbial content of samples by sequencing the < 1,600 bp 16S rRNA gene will be affected by such limitations.
Results: We describe a method for identifying the phylogenetic content of bacterial samples using high-throughput Pyrosequencing targeted at the 16S rRNA gene. Our analysis is adapted to the shorter read-lengths of such technology and uses a database of 16S rDNA to determine the most specific phylogenetic classification for reads, resulting in a weighted phylogenetic tree characterizing the content of the sample. We present results for six samples obtained from the human vagina during pregnancy that corroborates previous studies using conventional techniques.Next, we analyze the power of our method to classify reads at each level of the phylogeny using simulation experiments. We assess the impacts of read-length and database completeness on our method, and predict how we do as technology improves and more bacteria are sequenced. Finally, we study the utility of targeting specific 16S variable regions and show that such an approach considerably improves results for certain types of microbial samples. Using simulation, our method can be used to determine the most informative variable region.
Conclusion: This study provides positive validation of the effectiveness of targeting 16S metagenomes using short-read sequencing technology. Our methodology allows us to infer the most specific assignment of the sequence reads within the phylogeny, and to identify the most discriminative variable region to target. The analysis of high-throughput Pyrosequencing on human flora samples will accelerate the study of the relationship between the microbial world and ourselves.