Improving the sensitivity of the sequence profile method

R Lüthy; I Xenarios; P Bucher

doi:10.1002/pro.5560030118

Improving the sensitivity of the sequence profile method

Protein Sci. 1994 Jan;3(1):139-46. doi: 10.1002/pro.5560030118.

Authors

R Lüthy¹, I Xenarios, P Bucher

Affiliation

¹ Swiss Institute for Experimental Cancer Research (ISREC), Epalinges.

Abstract

The sequence profile method (Gribskov M, McLachlan AD, Eisenberg D, 1987, Proc Natl Acad Sci USA 84:4355-4358) is a powerful tool to detect distant relationships between amino acid sequences. A profile is a table of position-specific scores and gap penalties, providing a generalized description of a protein motif, which can be used for sequence alignments and database searches instead of an individual sequence. A sequence profile is derived from a multiple sequence alignment. We have found 2 ways to improve the sensitivity of sequence profiles: (1) Sequence weights: Usage of individual weights for each sequence avoids bias toward closely related sequences. These weights are automatically assigned based on the distance of the sequences using a published procedure (Sibbald PR, Argos P, 1990, J Mol Biol 216:813-818). (2) Amino acid substitution table: In addition to the alignment, the construction of a profile also needs an amino acid substitution table. We have found that in some cases a new table, the BLOSUM45 table (Henikoff S, Henikoff JG, 1992, Proc Natl Acad Sci USA 89:10915-10919), is more sensitive than the original Dayhoff table or the modified Dayhoff table used in the current implementation. Profiles derived by the improved method are more sensitive and selective in a number of cases where previous methods have failed to completely separate true members from false positives.

Publication types

Comparative Study

MeSH terms

Amino Acid Sequence
Globins / chemistry
Heat-Shock Proteins / chemistry
Molecular Sequence Data
Proto-Oncogene Proteins pp60(c-src) / chemistry
Sensitivity and Specificity
Sequence Analysis / methods*
Sequence Analysis / statistics & numerical data
Sequence Homology
Software

Substances

Heat-Shock Proteins
Globins
Proto-Oncogene Proteins pp60(c-src)