Improving position-specific predictions of protein functional sites using phylogenetic motifs

Bioinformatics. 2008 Oct 15;24(20):2308-16. doi: 10.1093/bioinformatics/btn454. Epub 2008 Aug 21.

Abstract

Motivation: Accurate computational prediction of protein functional sites is critical to maximizing the utility of recent high-throughput sequencing efforts. Among the available approaches, position-specific conservation scores remain among the most popular due to their accuracy and ease of computation. Unfortunately, high false positive rates remain a limiting factor. Using phylogenetic motifs (PMs), we have developed two combined (conservation + PMs) prediction schemes that significantly improve prediction accuracy.

Results: Our first approach, called position-specific MINER (psMINER), rank orders alignment columns by conservation. Subsequently, positions that are also not identified as PMs are excluded from the prediction set. This approach improves prediction accuracy, in a statistically significant way, compared to the underlying conservation scores. Increased accuracy is a general result, meaning improvement is observed over several different conservation scores that span a continuum of complexity. In addition, a hybrid MINER (hMINER) that quantitatively considers both scoring regimes provides further improvement. More importantly, it provides critical insight into the relative importance of phylogeny versus alignment conservation. Both methods outperform other common prediction algorithms that also utilize phylogenetic concepts. Finally, we demonstrate that the presented results are critically sensitive to functional site definition, thus highlighting the need for more complete benchmarks within the prediction community.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Motifs
  • Binding Sites
  • Computational Biology / methods
  • Conserved Sequence
  • Databases, Protein
  • Phylogeny*
  • Protein Conformation
  • Protein Structure, Tertiary
  • Proteins / chemistry*
  • Proteins / classification
  • Proteins / genetics
  • Sequence Alignment / methods
  • Sequence Analysis, Protein

Substances

  • Proteins