Detection of homologous proteins by an intermediate sequence search

Protein Sci. 2004 Jan;13(1):54-62. doi: 10.1110/ps.03335004.

Abstract

We developed a variant of the intermediate sequence search method (ISS(new)) for detection and alignment of weakly similar pairs of protein sequences. ISS(new) relates two query sequences by an intermediate sequence that is potentially homologous to both queries. The improvement was achieved by a more robust overlap score for a match between the queries through an intermediate. The approach was benchmarked on a data set of 2369 sequences of known structure with insignificant sequence similarity to each other (BLAST E-value larger than 0.001); 2050 of these sequences had a related structure in the set. ISS(new) performed significantly better than both PSI-BLAST and a previously described intermediate sequence search method. PSI-BLAST could not detect correct homologs for 1619 of the 2369 sequences. In contrast, ISS(new) assigned a correct homolog as the top hit for 121 of these 1619 sequences, while incorrectly assigning homologs for only nine targets; it did not assign homologs for the remainder of the sequences. By estimate, ISS(new) may be able to assign the folds of domains in approximately 29,000 of the approximately 500,000 sequences unassigned by PSI-BLAST, with 90% specificity (1 - false positives fraction). In addition, we show that the 15 alignments with the most significant BLAST E-values include the nearly best alignments constructed by ISS(new).

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Computers
  • Databases, Factual
  • False Negative Reactions
  • False Positive Reactions
  • Molecular Sequence Data
  • Protein Folding
  • Protein Structure, Tertiary
  • Proteins / chemistry*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sequence Alignment / methods*
  • Sequence Homology, Amino Acid
  • Software

Substances

  • Proteins