PHACCS, an online tool for estimating the structure and diversity of uncultured viral communities using metagenomic information

BMC Bioinformatics. 2005 Mar 2:6:41. doi: 10.1186/1471-2105-6-41.


Background: Phages, viruses that infect prokaryotes, are the most abundant microbes in the world. A major limitation to studying these viruses is the difficulty of cultivating the appropriate prokaryotic hosts. One way around this limitation is to directly clone and sequence shotgun libraries of uncultured viral communities (i.e., metagenomic analyses). PHACCS, Phage Communities from Contig Spectrum, is an online bioinformatic tool to assess the biodiversity of uncultured viral communities. PHACCS uses the contig spectrum from shotgun DNA sequence assemblies to mathematically model the structure of viral communities and make predictions about diversity.

Results: PHACCS builds models of possible community structure using a modified Lander-Waterman algorithm to predict the underlying contig spectrum. PHACCS finds the most appropriate structure model by optimizing the model parameters until the predicted contig spectrum is as close as possible to the experimental one. This model is the basis for making estimates of uncultured viral community richness, evenness, diversity index and abundance of the most abundant genotype.

Conclusion: PHACCS analysis of four different environmental phage communities suggests that the power law is an important rank-abundance form to describe uncultured viral community structure. The estimates support the fact that the four phage communities were extremely diverse and that phage community biodiversity and structure may be correlated with that of their hosts.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Bacteriophages / metabolism
  • Biodiversity
  • Computational Biology / methods*
  • Contig Mapping
  • DNA / chemistry
  • DNA Viruses
  • Databases, Genetic
  • Genes, Viral
  • Genetic Variation
  • Genome, Viral
  • Genotype
  • Internet
  • Models, Genetic
  • Models, Statistical
  • Protein Interaction Mapping / methods*
  • Sequence Analysis, DNA
  • Software*
  • Viruses / metabolism*


  • DNA