Improved sensitivity of profile searches through the use of sequence weights and gap excision

Comput Appl Biosci. 1994 Feb;10(1):19-29. doi: 10.1093/bioinformatics/10.1.19.

Abstract

Position-specific substitution matrices, known as profiles, derived from multiple sequence alignments are currently used to search sequence databases for distantly related members of protein families. The performance of the database searches is enhanced by using (i) a sequence weighting scheme which assigns higher weights to more distantly related sequences based on branch lengths derived from phylogenetic trees, (ii) exclusion of positions with mainly padding characters at sites of insertions or deletions and (iii) the BLOSUM62 residue comparison matrix. A natural consequence of these modifications is an improvement in the alignment of new sequences to the profiles. However, the accuracy of the alignments can be further increased by employing a similarity residue comparison matrix. These developments are implemented in a program called PROFILEWEIGHT which runs on Unix and Vax computers. The only input required by the program is the multiple sequence alignment. The output from PROFILEWEIGHT is a profile designed to be used by existing searching and alignment programs. Test results from database searches with four different families of proteins show the improved sensitivity of the weighted profiles.

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Animals
  • Databases, Factual*
  • Evaluation Studies as Topic
  • Mice
  • Proteins / genetics*
  • Sensitivity and Specificity
  • Sequence Alignment / methods*
  • Sequence Alignment / statistics & numerical data
  • Sequence Homology, Amino Acid
  • Software*

Substances

  • Proteins