Background: There are many ways to represent a molecule's properties, including atomic-connectivity drawings, NMR spectra, and molecular orbital models. Prior methods for predicting the biological activity of compounds have largely depended on these physical representations. Measuring a compound's binding potency against a small reference panel of diverse proteins defines a very different representation of the molecule, which we call an affinity fingerprint. Statistical analysis of such fingerprints provides new insights into aspects of binding interactions that are shared among a wide variety of proteins. These analyses facilitate prediction of the binding properties of these compounds assayed against new proteins.
Results: Affinity fingerprints are reported for 122 structurally-diverse compounds using a reference panel of eight proteins that collectively are able to generate unique fingerprints for about 75% of the small organic compounds tested. Application of multivariate regression techniques to this database enables the creation of computational surrogates to represent new proteins that are surprisingly effective at predicting binding potencies. We illustrate this for two enzymes with no previously recognizable similarity to each other or to any of the reference proteins. Fitting of analogous computational surrogates to four other proteins confirms the generality of the method; when applied to a fingerprinted library of 5000 compounds, several sub-micromolar hits were correctly predicted.
Conclusions: An affinity fingerprint database, which provides a rich source of data defining operational similarities among proteins, can be used to test theories of cryptic homology unexpected from current understanding of protein structure. Practical applications to drug design include efficient pre-screening of large numbers of compounds against target proteins using fingerprint similarities, supplemented by a small number of empirical measurements, to select promising compounds for further study.