Molecular similarity: a key technique in molecular informatics

Org Biomol Chem. 2004 Nov 21;2(22):3204-18. doi: 10.1039/B409813G. Epub 2004 Oct 14.


Molecular Informatics utilises many ideas and concepts to find relationships between molecules. The concept of similarity, where molecules may be grouped according to their biological effects or physicochemical properties has found extensive use in drug discovery. Some areas of particular interest have been in lead discovery and compound optimisation. For example, in designing libraries of compounds for lead generation, one approach is to design sets of compounds "similar" to known active compounds in the hope that alternative molecular structures are found that maintain the properties required while enhancing e.g. patentability, medicinal chemistry opportunities or even in achieving optimised pharmacokinetic profiles. Thus the practical importance of the concept of molecular similarity has grown dramatically in recent years. The predominant users are pharmaceutical companies, employing similarity methods in a wide range of applications e.g. virtual screening, estimation of absorption, distribution, metabolism, excretion and toxicity (ADME/Tox) and prediction of physicochemical properties (solubility, partitioning etc.). In this perspective, we discuss the representation of molecular structure (descriptors), methods of comparing structures and how these relate to measured properties. This leads to the concept of molecular similarity, its various definitions and uses and how these have evolved in recent years. Here, we wish to evaluate and in some cases challenge accepted views and uses of molecular similarity. Molecular similarity, as a paradigm, contains many implicit and explicit assumptions in particular with respect to the prediction of the binding and efficacy of molecules at biological receptors. The fundamental observation is that molecular similarity has a context which both defines and limits its use. The key issues of solvation effects, heterogeneity of binding sites and the fundamental problem of the form of similarity measure to use are addressed.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Algorithms*
  • Computational Biology / methods*
  • Linear Models
  • Models, Molecular*
  • Protein Binding
  • Proteins / chemistry
  • Proteins / metabolism
  • Quantitative Structure-Activity Relationship
  • Receptors, Drug


  • Proteins
  • Receptors, Drug