Three-dimensional cluster analysis offers a method for the prediction of functional residue clusters in proteins. This method requires a representative structure and a multiple sequence alignment as input data. Individual residues are represented in terms of regional alignments that reflect both their structural environment and their evolutionary variation, as defined by the alignment of homologous sequences. From the overall (global) and the residue-specific (regional) alignments, we calculate the global and regional similarity matrices, containing scores for all pairwise sequence comparisons in the respective alignments. Comparing the matrices yields two scores for each residue. The regional conservation score (C(R)(x)) defines the conservation of each residue x and its neighbors in 3D space relative to the protein as a whole. The similarity deviation score (S(x)) detects residue clusters with sequence similarities that deviate from the similarities suggested by the full-length sequences. We evaluated 3D cluster analysis on a set of 35 families of proteins with available cocrystal structures, showing small ligand interfaces, nucleic acid interfaces and two types of protein-protein interfaces (transient and stable). We present two examples in detail: fructose-1,6-bisphosphate aldolase and the mitogen-activated protein kinase ERK2. We found that the regional conservation score (C(R)(x)) identifies functional residue clusters better than a scoring scheme that does not take 3D information into account. C(R)(x) is particularly useful for the prediction of poorly conserved, transient protein-protein interfaces. Many of the proteins studied contained residue clusters with elevated similarity deviation scores. These residue clusters correlate with specificity-conferring regions: 3D cluster analysis therefore represents an easily applied method for the prediction of functionally relevant spatial clusters of residues in proteins.
Copyright 2001 Academic Press.