Real-time ligand binding pocket database search using local surface descriptors

Proteins. 2010 Jul;78(9):2007-28. doi: 10.1002/prot.22715.


Because of the increasing number of structures of unknown function accumulated by ongoing structural genomics projects, there is an urgent need for computational methods for characterizing protein tertiary structures. As functions of many of these proteins are not easily predicted by conventional sequence database searches, a legitimate strategy is to utilize structure information in function characterization. Of particular interest is prediction of ligand binding to a protein, as ligand molecule recognition is a major part of molecular function of proteins. Predicting whether a ligand molecule binds a protein is a complex problem due to the physical nature of protein-ligand interactions and the flexibility of both binding sites and ligand molecules. However, geometric and physicochemical complementarity is observed between the ligand and its binding site in many cases. Therefore, ligand molecules which bind to a local surface site in a protein can be predicted by finding similar local pockets of known binding ligands in the structure database. Here, we present two representations of ligand binding pockets and utilize them for ligand binding prediction by pocket shape comparison. These representations are based on mapping of surface properties of binding pockets, which are compactly described either by the two-dimensional pseudo-Zernike moments or the three-dimensional Zernike descriptors. These compact representations allow a fast real-time pocket searching against a database. Thorough benchmark studies employing two different datasets show that our representations are competitive with the other existing methods. Limitations and potentials of the shape-based methods as well as possible improvements are discussed.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Adenosine Monophosphate
  • Algorithms
  • Area Under Curve
  • Binding Sites*
  • Cluster Analysis
  • Computational Biology / methods*
  • Databases, Genetic*
  • Ligands
  • Models, Statistical*
  • Protein Binding
  • Protein Structure, Tertiary*
  • Proteins / chemistry
  • Proteins / classification
  • ROC Curve


  • Ligands
  • Proteins
  • Adenosine Monophosphate