Detection of homologous proteins by an intermediate sequence search

Bino John; Andrej Sali

doi:10.1110/ps.03335004

Detection of homologous proteins by an intermediate sequence search

Protein Sci. 2004 Jan;13(1):54-62. doi: 10.1110/ps.03335004.

Authors

Bino John¹, Andrej Sali

Affiliation

¹ Laboratory of Molecular Biophysics, Pels Family Center for Biochemistry and Structural Biology, The Rockefeller University, New York, New York 10021, USA.

Abstract

We developed a variant of the intermediate sequence search method (ISS(new)) for detection and alignment of weakly similar pairs of protein sequences. ISS(new) relates two query sequences by an intermediate sequence that is potentially homologous to both queries. The improvement was achieved by a more robust overlap score for a match between the queries through an intermediate. The approach was benchmarked on a data set of 2369 sequences of known structure with insignificant sequence similarity to each other (BLAST E-value larger than 0.001); 2050 of these sequences had a related structure in the set. ISS(new) performed significantly better than both PSI-BLAST and a previously described intermediate sequence search method. PSI-BLAST could not detect correct homologs for 1619 of the 2369 sequences. In contrast, ISS(new) assigned a correct homolog as the top hit for 121 of these 1619 sequences, while incorrectly assigning homologs for only nine targets; it did not assign homologs for the remainder of the sequences. By estimate, ISS(new) may be able to assign the folds of domains in approximately 29,000 of the approximately 500,000 sequences unassigned by PSI-BLAST, with 90% specificity (1 - false positives fraction). In addition, we show that the 15 alignments with the most significant BLAST E-values include the nearly best alignments constructed by ISS(new).

Publication types

Comparative Study
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, P.H.S.

MeSH terms

Algorithms
Amino Acid Sequence
Computers
Databases, Factual
False Negative Reactions
False Positive Reactions
Molecular Sequence Data
Protein Folding
Protein Structure, Tertiary
Proteins / chemistry*
Reproducibility of Results
Sensitivity and Specificity
Sequence Alignment / methods*
Sequence Homology, Amino Acid
Software

Substances

Proteins

Abstract

Publication types

MeSH terms

Substances

Grants and funding