Structures of protein complexes provide atomistic insights into protein interactions. Human proteins represent a quarter of all structures in the Protein Data Bank; however, available protein complexes cover less than 10% of the human proteome. Although it is theoretically possible to infer interactions in human proteins based on structures of homologous protein complexes, it is still unclear to what extent protein interactions and binding sites are conserved, and whether protein complexes from remotely related species can be used to infer interactions and binding sites. We considered biological units of protein complexes and clustered protein-protein binding sites into similarity groups based on their structure and sequence, which allowed us to identify unique binding sites. We showed that the growth rate of the number of unique binding sites in the Protein Data Bank was much slower than the growth rate of the number of structural complexes. Next, we investigated the evolutionary roots of unique binding sites and identified the major phyletic branches with the largest expansion in the number of novel binding sites. We found that many binding sites could be traced to the universal common ancestor of all cellular organisms, whereas relatively few binding sites emerged at the major evolutionary branching points. We analyzed the physicochemical properties of unique binding sites and found that the most ancient sites were the largest in size, involved many salt bridges, and were the most compact and least planar. In contrast, binding sites that appeared more recently in the evolution of eukaryotes were characterized by a larger fraction of polar and aromatic residues, and were less compact and more planar, possibly due to their more transient nature and roles in signaling processes.
Copyright © 2015 Biophysical Society. Published by Elsevier Inc. All rights reserved.