Embedding strategies for effective use of information from multiple sequence alignments

Protein Sci. 1997 Mar;6(3):698-705. doi: 10.1002/pro.5560060319.

Abstract

We describe a new strategy for utilizing multiple sequence alignment information to detect distant relationships in searches of sequence databases. A single sequence representing a protein family is enriched by replacing conserved regions with position-specific scoring matrices (PSSMs) or consensus residues derived from multiple alignments of family members. In comprehensive tests of these and other family representations, PSSM-embedded queries produced the best results overall when used with a special version of the Smith-Waterman searching algorithm. Moreover, embedding consensus residues instead of PSSMs improved performance with readily available single sequence query searching programs, such as BLAST and FASTA. Embedding PSSMs or consensus residues into a representative sequence improves searching performance by extracting multiple alignment information from motif regions while retaining single sequence information where alignment is uncertain.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Consensus Sequence
  • Evaluation Studies as Topic
  • Molecular Sequence Data
  • Proteins / chemistry*
  • Sequence Alignment*

Substances

  • Proteins

Associated data

  • SWISSPROT/UNKNOWN