Probabilistic scoring measures for profile-profile comparison yield more accurate short seed alignments

Bioinformatics. 2003 Aug 12;19(12):1531-9. doi: 10.1093/bioinformatics/btg185.

Abstract

Motivation: The development of powerful automatic methods for the comparison of protein sequences has become increasingly important. Profile-to-profile comparisons allow for the use of broader information about protein families, resulting in more sensitive and accurate comparisons of distantly related sequences. A key part in the comparison of two profiles is the method for the calculation of scores for the position matches. A number of methods based on various theoretical considerations have been proposed. We implemented several previously reported scoring functions as well as our own functions, and compared them on the basis of their ability to produce accurate short ungapped alignments of a given length.

Results: Our results suggest that the family of the probabilistic methods (log-odds based methods and prof_sim) may be the more appropriate choice for the generation of initial 'seeds' as the first step to produce local profile-profile alignments. The most effective scoring systems were the closely related modifications of functions previously implemented in the COMPASS and Picasso methods.

Publication types

  • Comparative Study
  • Evaluation Study
  • Validation Study

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Computer Simulation*
  • Gene Expression Profiling / methods*
  • Models, Genetic*
  • Models, Statistical*
  • Molecular Sequence Data
  • Reproducibility of Results
  • Sample Size
  • Sensitivity and Specificity
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein / methods*