Detection of recurring three-dimensional side-chain patterns is a potential means of inferring protein function. This paper presents a new method for detecting such patterns and discusses various implications. The method allows detection of side-chain patterns without any prior knowledge of function, requiring only protein structure data and associated multiple sequence alignments. A recursive, depth-first search algorithm finds all possible groups of identical amino acids common to two protein structures independent of sequence order. The search is highly constrained by distance constraints, and by ignoring amino acids unlikely to be involved in protein function. A weighted root-mean-square deviation (RMSD) between equivalenced groups of amino acids is used as a measure of similarity. The statistical significance of any RMSD is assigned by reference to a distribution fitted to simulated data. Searches with the Ser/His/Asp catalytic triad, a His/His porphyrin binding pattern, and the zinc-finger Cys/Cys/His/His pattern are performed to test the method on known examples. An all-against-all comparison of representatives from the structural classification of proteins (SCOP) is performed, revealing several new examples of evolutionary convergence to common patterns of side-chains within different tertiary folds and in different orders along the sequence. These include a di-zinc binding Asp/Asp/His/His/Ser pattern common to alkaline phosphatase/bacterial aminopeptidase, and an Asp/Glu/His/His/Asn/Asn pattern common to the active sites of DNase I and endocellulase E1. Implications for protein evolution, function prediction and the rational design of functional regulators are discussed.
Copyright 1998 Academic Press.