MultiMCS: a fast algorithm for the maximum common substructure problem on multiple molecules

J Chem Inf Model. 2011 Apr 25;51(4):788-806. doi: 10.1021/ci100297y. Epub 2011 Mar 29.

Abstract

Several efficient correspondence graph-based algorithms for determining the maximum common substructure (MCS) of a pair of molecules have been published in the literature. The extension of the problem to three or more molecules is however nontrivial; heuristics used to increase the efficiency in the two-molecule case are either inapplicable to the many-molecule case or do not provide significant speedups. Our specific algorithmic contribution is two-fold. First, we show how the correspondence graph approach for the two-molecule case can be generalized to obtain an algorithm that is guaranteed to find the optimum connected MCS of multiple molecules, and that runs fast on most families of molecules using a new divide-and-conquer strategy that has hitherto not been reported in this context. Second, we provide a characterization of those compound families for which the algorithm might run slowly, along with a heuristic for speeding up computations on these families. We also extend the above algorithm to a heuristic algorithm to find the disconnected MCS of multiple molecules and to an algorithm for clustering molecules into groups, with each group sharing a substantial MCS. Our methods are flexible in that they provide exquisite control on various matching criteria used to define a common substructure.

MeSH terms

  • Algorithms*
  • Artificial Intelligence*
  • Cluster Analysis
  • Matched-Pair Analysis
  • Molecular Structure*
  • Pattern Recognition, Automated / methods*