Protein domain identification and improved sequence similarity searching using PSI-BLAST

Proteins. 2002 Sep 1;48(4):672-81. doi: 10.1002/prot.10175.


Protein sequences containing more than one structural domain are problematic when used in homology searches where they can either stop an iterative database search prematurely or cause an explosion of a search to common domains. We describe a method, DOMAINATION, that infers domains and their boundaries in a query sequence from local gapped alignments generated using PSI-BLAST. Through a new technique to recognize domain insertions and permutations, DOMAINATION submits delineated domains as successive database queries in further iterative steps. Assessed over a set of 452 multidomain proteins, the method predicts structural domain boundaries with an overall accuracy of 50% and improves finding distant homologies by 14% compared with PSI-BLAST. DOMAINATION is available as a web based tool at, and the source code is available from the authors upon request.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Computational Biology / methods
  • Databases, Protein*
  • Protein Structure, Tertiary*
  • Proteins / chemistry*
  • Repetitive Sequences, Amino Acid
  • Reproducibility of Results
  • Sequence Alignment
  • Sequence Analysis, Protein / methods*
  • Sequence Homology, Amino Acid


  • Proteins