Phylogenetic information improves homology detection

Proteins. 2001 Dec 1;45(4):360-71. doi: 10.1002/prot.1156.

Abstract

We present a database search method that is based on phylogenetic trees (treesearch). The method is used to search a protein sequence database for homologs to a protein family. In preparation for the search, a phylogenetic tree is constructed from a given multiple alignment of the family. During the search, each database sequence is temporarily inserted into the tree, thus adding a new edge to the tree. Homology between family and sequence is then judged from the length of this edge. In a comparison of our method to profiles (ISREC pfsearch), two implementations of hidden Markov models (HMMER hmmsearch and SAM hmmscore), and to the family pairwise search (FPS) method on 43 families from the SCOP database based on minimum false-positive counts (min-FPCs), we found a considerable gain in sensitivity. In 69% of the test cases, treesearch showed a min-FPC of at most 50, whereas the two second best methods (hmmsearch and FPS) showed this performance only in 53% cases. A similar impression holds for a large range of min-FPC thresholds. The results demonstrate that phylogenetic information can significantly improve the detection of distant homologies and justify our method as a useful alternative to existing methods.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Databases, Protein
  • Humans
  • Methods
  • Phylogeny*
  • Proteins / chemistry
  • Proteins / genetics
  • Sequence Alignment
  • Sequence Homology, Amino Acid*

Substances

  • Proteins