Common Structural Core of Three-Dozen Residues Reveals Intersuperfamily Relationships

Mol Biol Evol. 2016 Jul;33(7):1697-710. doi: 10.1093/molbev/msw047. Epub 2016 Mar 1.

Abstract

Identification of relationships among protein families or superfamilies is a challenge. However, functionally essential protein regions typically retain structural integrity, even when the corresponding protein sequences evolve. Consequently, comparison of protein structures enables deeper phylogenetic analyses than achievable through the use of sequence information only. Here, we focus on a group of distantly related viral and cellular enzymes involved in nucleic acid or nucleotide processing and synthesis. All these enzymes share an apparently similar protein fold at their active site, which resembles the palm subdomain of the right-hand-shaped polymerases. Using a structure-based hierarchical clustering method, we identified a common structural core of 36 equivalent residues for this functionally diverse group of enzymes, representing five protein superfamilies. Based on the properties of these 36 residues, we deduced a structural distance-based tree in which the proteins were accurately clustered according to the established family classification. Within this tree, the enzymes catalyzing genomic nucleic acid replication or transcription were separated from those performing supplementary nucleic acid or nucleotide processing functions. In addition, we found that the family Y DNA polymerases are structurally more closely related to the nucleotide cyclase superfamily members than to the other members of the DNA/RNA polymerase superfamily, and these enzymes share 88 equivalent residues comprising a Β: 1- Α: 1- Α: 2- Β: 2- Β: 3- Α: 3- Β: 4- Α: 4- Β: 5 fold. The results highlight the power of structure-based hierarchical clustering in identifying remote evolutionary relationships. Furthermore, our study implies that a protein substructure of only three-dozen residues can contain a substantial amount of information on the evolutionary history of proteins.

Keywords: nucleic acid and nucleotide processing enzymes; polymerase evolution; protein evolution; structural alignment; structural distances.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Catalytic Domain
  • Cluster Analysis
  • Evolution, Molecular
  • Genomics
  • Models, Molecular
  • Phylogeny
  • Proteins / chemistry*
  • Proteins / genetics*
  • Sequence Alignment / methods
  • Sequence Analysis, Protein / methods*
  • Structural Homology, Protein*
  • Structure-Activity Relationship

Substances

  • Proteins