Predicting prokaryotic ecological niches using genome sequence analysis

PLoS One. 2007 Aug 15;2(8):e743. doi: 10.1371/journal.pone.0000743.

Abstract

Automated DNA sequencing technology is so rapid that analysis has become the rate-limiting step. Hundreds of prokaryotic genome sequences are publicly available, with new genomes uploaded at the rate of approximately 20 per month. As a result, this growing body of genome sequences will include microorganisms not previously identified, isolated, or observed. We hypothesize that evolutionary pressure exerted by an ecological niche selects for a similar genetic repertoire in those prokaryotes that occupy the same niche, and that this is due to both vertical and horizontal transmission. To test this, we have developed a novel method to classify prokaryotes, by calculating their Pfam protein domain distributions and clustering them with all other sequenced prokaryotic species. Clusters of organisms are visualized in two dimensions as 'mountains' on a topological map. When compared to a phylogenetic map constructed using 16S rRNA, this map more accurately clusters prokaryotes according to functional and environmental attributes. We demonstrate the ability of this map, which we term a "niche map", to cluster according to ecological niche both quantitatively and qualitatively, and propose that this method be used to associate uncharacterized prokaryotes with their ecological niche as a means of predicting their functional role directly from their genome sequence.

MeSH terms

  • Base Sequence
  • Biological Evolution
  • Cluster Analysis
  • Ecology
  • Ecosystem*
  • Gammaproteobacteria / classification
  • Gammaproteobacteria / genetics
  • Genome*
  • Humans
  • Molecular Sequence Data
  • Phylogeny
  • Prokaryotic Cells / classification
  • Prokaryotic Cells / physiology*
  • RNA, Ribosomal, 16S / genetics
  • Sequence Alignment
  • Sequence Analysis, DNA / methods*

Substances

  • RNA, Ribosomal, 16S