Power keys: a novel class of topological descriptors based on exhaustive subgraph enumeration and their application in substructure searching

J Chem Inf Model. 2011 Nov 28;51(11):2843-51. doi: 10.1021/ci200282z. Epub 2011 Oct 18.


We present a novel class of topological molecular descriptors, which we call power keys. Power keys are computed by enumerating all possible linear, branch, and cyclic subgraphs up to a given size, encoding the connected atoms and bonds into two separate components, and recording the number of occurrences of each subgraph. We have applied these new descriptors for the screening stage of substructure searching on a relational database of about 1 million compounds using a diverse set of reference queries. The new keys can eliminate the vast majority (>99.9% on average) of nonmatching molecules within a fraction of a second. More importantly, for many of the queries the screening efficiency is 100%. A common feature was identified for the molecules for which power keys have perfect discriminative ability. This feature can be exploited to obviate the need for expensive atom-by-atom matching in situations where some ambiguity can be tolerated (fuzzy substructure searching). Other advantages over commonly used molecular keys are also discussed.

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Computational Biology / statistics & numerical data
  • Databases, Factual
  • Drug Discovery / methods*
  • Drug Discovery / statistics & numerical data
  • Fuzzy Logic
  • Models, Molecular
  • Software*
  • Structure-Activity Relationship