Align-m--a new algorithm for multiple alignment of highly divergent sequences

Bioinformatics. 2004 Jun 12;20(9):1428-35. doi: 10.1093/bioinformatics/bth116. Epub 2004 Feb 12.

Abstract

Motivation: Multiple alignment of highly divergent sequences is a challenging problem for which available programs tend to show poor performance. Generally, this is due to a scoring function that does not describe biological reality accurately enough or a heuristic that cannot explore solution space efficiently enough. In this respect, we present a new program, Align-m, that uses a non-progressive local approach to guide a global alignment.

Results: Two large test sets were used that represent the entire SCOP classification and cover sequence similarities between 0 and 50% identity. Performance was compared with the publicly available algorithms ClustalW, T-Coffee and DiAlign. In general, Align-m has comparable or slightly higher accuracy in terms of correctly aligned residues, especially for distantly related sequences. Importantly, it aligns much fewer residues incorrectly, with average differences of over 15% compared with some of the other algorithms.

Availability: Align-m and the test sets are available at http://bioinformatics.vub.ac.be

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Genetic Variation / genetics*
  • Molecular Sequence Data
  • Proteins / analysis
  • Proteins / chemistry*
  • Proteins / genetics*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein / methods*
  • Sequence Homology, Amino Acid

Substances

  • Proteins