Evolutionary trace annotation of protein function in the structural proteome

J Mol Biol. 2010 Mar 12;396(5):1451-73. doi: 10.1016/j.jmb.2009.12.037. Epub 2009 Dec 28.


By design, structural genomics (SG) solves many structures that cannot be assigned function based on homology to known proteins. Alternative function annotation methods are therefore needed and this study focuses on function prediction with three-dimensional (3D) templates: small structural motifs built of just a few functionally critical residues. Although experimentally proven functional residues are scarce, we show here that Evolutionary Trace (ET) rankings of residue importance are sufficient to build 3D templates, match them, and then assign Gene Ontology (GO) functions in enzymes and non-enzymes alike. In a high-specificity mode, this Evolutionary Trace Annotation (ETA) method covered half (53%) of the 2384 annotated SG protein controls. Three-quarters (76%) of predictions were both correct and complete. The positive predictive value for all GO depths (all-depth PPV) was 84%, and it rose to 94% over GO depths 1-3 (depth 3 PPV). In a high-sensitivity mode, coverage rose significantly (84%), while accuracy fell moderately: 68% of predictions were both correct and complete, all-depth PPV was 75%, and depth 3 PPV was 86%. These data concur with prior mutational experiments showing that ET rank information identifies key functional determinants in proteins. In practice, ETA predicted functions in 42% of 3461 unannotated SG proteins. In 529 cases--including 280 non-enzymes and 21 for metal ion ligands--the expected accuracy is 84% at any GO depth and 94% down to GO depth 3, while for the remaining 931 the expected accuracies are 60% and 71%, respectively. Thus, local structural comparisons of evolutionarily important residues can help decipher protein functions to known reliability levels and without prior assumption on functional mechanisms. ETA is available at http://mammoth.bcm.tmc.edu/eta.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Animals
  • Databases, Protein
  • Enzymes / chemistry
  • Enzymes / genetics
  • Evolution, Molecular*
  • Genomics
  • Humans
  • Models, Molecular
  • Protein Conformation
  • Proteins / chemistry*
  • Proteins / genetics*
  • Proteins / metabolism
  • Proteome*
  • Sequence Alignment
  • Sequence Analysis, Protein
  • Sequence Homology, Amino Acid
  • Structural Homology, Protein


  • Enzymes
  • Proteins
  • Proteome