A method is presented to assess the significance of binding site similarities within superimposed protein three-dimensional (3D) structures and applied to all similar structures in the Protein Data Bank. For similarities between 3D structures lacking significant sequence similarity, the important distinction was made between remote homology (an ancient common ancestor) and analogy (likely convergence to a folding motif) according to the structural classification of proteins (SCOP) database. Supersites were defined as structural locations on groups of analogous proteins (i.e. superfolds) showing a statistically significant tendency to bind substrates despite little evidence of a common ancestor for the proteins considered. We identify three potentially new superfolds containing supersites: ferredoxin-like folds, four-helical bundles and double-stranded beta helices. In addition, the method quantifies binding site similarities within homologous proteins and previously identified supersites such as that found in the beta/alpha (TIM) barrels. For the nine superfolds, the accuracy of predictions of binding site locations is assessed. Implications for protein evolution, and the prediction of protein function either through fold recognition or tertiary structure comparison, are discussed.
Copyright 1998 Academic Press.