Structure-guided rule-based annotation of protein functional sites in UniProt knowledgebase

Methods Mol Biol. 2011;694:91-105. doi: 10.1007/978-1-60761-977-2_7.


The rapid growth of protein sequence databases has necessitated the development of methods to computationally derive annotation for uncharacterized entries. Most such methods focus on "global" annotation, such as molecular function or biological process. Methods to supply high-accuracy "local" annotation to functional sites based on structural information at the level of individual amino acids are relatively rare. In this chapter we will describe a method we have developed for annotation of functional residues within experimentally-uncharacterized proteins that relies on position-specific site annotation rules (PIR Site Rules) derived from structural and experimental information. These PIR Site Rules are manually defined to allow for conditional propagation of annotation. Each rule specifies a tripartite set of conditions whereby candidates for annotation must pass a whole-protein classification test (that is, have end-to-end match to a whole-protein-based HMM), match a site-specific profile HMM and, finally, match functionally and structurally characterized residues of a template. Positive matches trigger the appropriate annotation for active site residues, binding site residues, modified residues, or other functionally important amino acids. The strict criteria used in this process have rendered high-confidence annotation suitable for UniProtKB/Swiss-Prot features.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Amino Acid Sequence
  • Amino Acids / chemistry*
  • Computational Biology / methods*
  • Coproporphyrinogen Oxidase / chemistry
  • Coproporphyrinogen Oxidase / metabolism
  • Databases, Protein*
  • Escherichia coli / metabolism
  • Knowledge Bases*
  • Molecular Sequence Annotation / methods*
  • Molecular Sequence Data
  • Proteins / chemistry*
  • Thioredoxins / chemistry
  • Thioredoxins / metabolism


  • Amino Acids
  • Proteins
  • Thioredoxins
  • Coproporphyrinogen Oxidase