Revisiting the yeast PPR proteins--application of an Iterative Hidden Markov Model algorithm reveals new members of the rapidly evolving family

Mol Biol Evol. 2011 Oct;28(10):2935-48. doi: 10.1093/molbev/msr120. Epub 2011 May 4.

Abstract

Pentatricopeptide repeat (PPR) proteins are the largest known RNA-binding protein family, and are found in all eukaryotes, being particularly abundant in higher plants. PPR proteins localize mostly to mitochondria and chloroplasts, and many were shown to modulate organellar genome expression on the posttranscriptional level. Although the genomes of land plants encode hundreds of PPR proteins, only a few have been identified in Fungi and Metazoa. As the current PPR motif profiles are built mainly on the basis of the predominant plant sequences, they are unlikely to be optimal for detecting fungal and animal members of the family, and many putative PPR proteins in these genomes may remain undetected. In order to verify this hypothesis, we designed a hidden Markov model-based bioinformatic tool called Supervised Clustering-based Iterative Phylogenetic Hidden Markov Model algorithm for the Evaluation of tandem Repeat motif families (SCIPHER) using sequence data from orthologous clusters from available yeast genomes. This approach allowed us to assign 12 new proteins in Saccharomyces cerevisiae to the PPR family. Similarly, in other yeast species, we obtained a 5-fold increase in the detection of PPR motifs, compared with the previous tools. All the newly identified S. cerevisiae PPR proteins localize in the mitochondrion and are a part of the RNA processing interaction network. Furthermore, the yeast PPR proteins seem to undergo an accelerated divergent evolution. Analysis of single and double amino acid substitutions in the Dmr1 protein of S. cerevisiae suggests that cooperative interactions between motifs and pseudoreversion could be the force driving this rapid evolution.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Cluster Analysis
  • Evolution, Molecular*
  • Genome, Mitochondrial
  • Genomics / methods*
  • Markov Chains*
  • Molecular Sequence Data
  • Phylogeny
  • RNA-Binding Proteins / genetics*
  • Saccharomyces cerevisiae Proteins / genetics*
  • Sequence Alignment

Substances

  • RNA-Binding Proteins
  • Saccharomyces cerevisiae Proteins