Assigning protein functions by comparative genome analysis: protein phylogenetic profiles

Proc Natl Acad Sci U S A. 1999 Apr 13;96(8):4285-8. doi: 10.1073/pnas.96.8.4285.


Determining protein functions from genomic sequences is a central goal of bioinformatics. We present a method based on the assumption that proteins that function together in a pathway or structural complex are likely to evolve in a correlated fashion. During evolution, all such functionally linked proteins tend to be either preserved or eliminated in a new species. We describe this property of correlated evolution by characterizing each protein by its phylogenetic profile, a string that encodes the presence or absence of a protein in every known genome. We show that proteins having matching or similar profiles strongly tend to be functionally linked. This method of phylogenetic profiling allows us to predict the function of uncharacterized proteins.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Bacterial Proteins / chemistry
  • Bacterial Proteins / genetics
  • Escherichia coli / genetics*
  • Escherichia coli Proteins
  • Evolution, Molecular*
  • Genome*
  • Genome, Bacterial*
  • Models, Biological
  • Open Reading Frames
  • Phylogeny*
  • Proteins / chemistry*
  • Proteins / genetics
  • Ribosomal Proteins / chemistry


  • Bacterial Proteins
  • Escherichia coli Proteins
  • Proteins
  • Ribosomal Proteins
  • rplL protein, E coli