A word-oriented approach to alignment validation

Bioinformatics. 2005 May 15;21(10):2230-9. doi: 10.1093/bioinformatics/bti335. Epub 2005 Feb 22.

Abstract

Motivation: Multiple sequence alignment at the level of whole proteomes requires a high degree of automation, precluding the use of traditional validation methods such as manual curation. Since evolutionary models are too general to describe the history of each residue in a protein family, there is no single algorithm/model combination that can yield a biologically or evolutionarily optimal alignment. We propose a 'shotgun' strategy where many different algorithms are used to align the same family, and the best of these alignments is then chosen with a reliable objective function. We present WOOF, a novel 'word-oriented' objective function that relies on the identification and scoring of conserved amino acid patterns (words) between pairs of sequences.

Results: Tests on a subset of reference protein alignments from BAliBASE showed that WOOF tended to rank the (manually curated) reference alignment highest among 1060 alternative (automatically generated) alignments for a majority of protein families. Among the automated alignments, there was a strong positive relationship between the WOOF score and similarity to the reference alignment. The speed of WOOF and its independence from explicit considerations of three-dimensional structure make it an excellent tool for analyzing large numbers of protein families.

Availability: On request from the authors.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Algorithms*
  • Artificial Intelligence*
  • Natural Language Processing
  • Pattern Recognition, Automated / methods*
  • Proteins / analysis*
  • Proteins / chemistry*
  • Semantics
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein / methods*

Substances

  • Proteins