Simple chained guide trees give high-quality protein multiple sequence alignments

Proc Natl Acad Sci U S A. 2014 Jul 22;111(29):10556-61. doi: 10.1073/pnas.1405628111. Epub 2014 Jul 7.

Abstract

Guide trees are used to decide the order of sequence alignment in the progressive multiple sequence alignment heuristic. These guide trees are often the limiting factor in making large alignments, and considerable effort has been expended over the years in making these quickly or accurately. In this article we show that, at least for protein families with large numbers of sequences that can be benchmarked with known structures, simple chained guide trees give the most accurate alignments. These also happen to be the fastest and simplest guide trees to construct, computationally. Such guide trees have a striking effect on the accuracy of alignments produced by some of the most widely used alignment packages. There is a marked increase in accuracy and a marked decrease in computational time, once the number of sequences goes much above a few hundred. This is true, even if the order of sequences in the guide tree is random.

Keywords: Clustal; Mafft; Muscle; PFAM.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Cytochrome P-450 Enzyme System / chemistry
  • Databases, Protein
  • Proteins / chemistry*
  • Reference Standards
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein*
  • Software*

Substances

  • Proteins
  • Cytochrome P-450 Enzyme System