Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches

J Mol Biol. 1999 Apr 16;287(5):1023-40. doi: 10.1006/jmbi.1999.2653.


Using a number of diverse protein families as test cases, we investigate the ability of the recently developed iterative sequence database search method, PSI-BLAST, to identify subtle relationships between proteins that originally have been deemed detectable only at the level of structure-structure comparison. We show that PSI-BLAST can detect many, though not all, of such relationships, but the success critically depends on the optimal choice of the query sequence used to initiate the search. Generally, there is a correlation between the diversity of the sequences detected in the first pass of database screening and the ability of a given query to detect subtle relationships in subsequent iterations. Accordingly, a thorough analysis of protein superfamilies at the sequence level is necessary in order to maximize the chances of gleaning non-trivial structural and functional inferences, as opposed to a single search, initiated, for example, with the sequence of a protein whose structure is available. This strategy is illustrated by several findings, each of which involves an unexpected structural prediction: (i) a number of previously undetected proteins with the HSP70-actin fold are identified, including a highly conserved and nearly ubiquitous family of metal-dependent proteases (typified by bacterial O-sialoglycoprotease) that represent an adaptation of this fold to a new type of enzymatic activity; (ii) we show that, contrary to the previous conclusions, ATP-dependent and NAD-dependent DNA ligases are confidently predicted to possess the same fold; (iii) the C-terminal domain of 3-phosphoglycerate dehydrogenase, which binds serine and is involved in allosteric regulation of the enzyme activity, is shown to typify a new superfamily of ligand-binding, regulatory domains found primarily in enzymes and regulators of amino acid and purine metabolism; (iv) the immunoglobulin-like DNA-binding domain previously identified in the structures of transcription factors NFkappaB and NFAT is shown to be a member of a distinct superfamily of intracellular and extracellular domains with the immunoglobulin fold; and (v) the Rag-2 subunit of the V-D-J recombinase is shown to contain a kelch-type beta-propeller domain which rules out its evolutionary relationship with bacterial transposases.

MeSH terms

  • Actins / chemistry
  • Adenosine Triphosphate / metabolism
  • Amino Acid Sequence
  • Binding Sites
  • DNA Ligases / chemistry
  • DNA Ligases / metabolism
  • DNA-Binding Proteins / chemistry
  • DNA-Binding Proteins / metabolism
  • Databases, Factual
  • Evolution, Molecular*
  • HSP70 Heat-Shock Proteins / chemistry
  • Immunoglobulins / chemistry
  • Immunoglobulins / physiology
  • Information Storage and Retrieval*
  • Models, Molecular
  • Molecular Sequence Data
  • NAD / metabolism
  • Protein Folding
  • Proteins / chemistry*
  • Proteins / physiology
  • Sequence Homology, Amino Acid
  • Software
  • Transcription Factors / chemistry
  • Transcription Factors / metabolism


  • Actins
  • DNA-Binding Proteins
  • HSP70 Heat-Shock Proteins
  • Immunoglobulins
  • Proteins
  • Transcription Factors
  • V(D)J recombination activating protein 2
  • NAD
  • Adenosine Triphosphate
  • DNA Ligases