Identification of side-chain clusters in protein structures by a graph spectral method

J Mol Biol. 1999 Sep 17;292(2):441-64. doi: 10.1006/jmbi.1999.3058.

Abstract

This paper presents a novel method to detect side-chain clusters in protein three-dimensional structures using a graph spectral approach. Protein side-chain interactions are represented by a labeled graph in which the nodes of the graph represent the Cbeta atoms and the edges represent the distance between the Cbeta atoms. The distance information and the non-bonded connectivity of the residues are represented in the form of a matrix called the Laplacian matrix. The constructed matrix is diagonalized and clustering information is obtained from the vector components associated with the second lowest eigenvalue and cluster centers are obtained from the vector components associated with the top eigenvalues. The method uses global information for clustering and a single numeric computation is required to detect clusters of interest. The approach has been adopted here to detect a variety of side-chain clusters and identify the residue which makes the largest number of interactions among the residues forming the cluster (cluster centers). Detecting such clusters and cluster centers are important from a protein structure and folding point of view. The crucial residues which are important in the folding pathway as determined by PhiF values (which is a measure of the effect of a mutation on the stability of the transition state of folding) as obtained from protein engineering methods, can be identified from the vector components corresponding to the top eigenvalues. Expanded clusters are detected near the active and binding site of the protein, supporting the nucleation condensation hypothesis for folding. The method is also shown to detect domains in protein structures and conserved side-chain clusters in topologically similar proteins.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Computer Simulation
  • Databases, Factual
  • Hemoglobins / chemistry
  • Models, Molecular
  • Molecular Sequence Data
  • Protein Engineering
  • Protein Folding
  • Protein Structure, Secondary
  • Protein Structure, Tertiary
  • Proteins / chemistry*

Substances

  • Hemoglobins
  • Proteins