Merging applicability domains for in silico assessment of chemical mutagenicity

J Chem Inf Model. 2014 Mar 24;54(3):793-800. doi: 10.1021/ci500016v. Epub 2014 Feb 14.


Using a benchmark Ames mutagenicity data set, we evaluated the performance of molecular fingerprints as descriptors for developing quantitative structure-activity relationship (QSAR) models and defining applicability domains with two machine-learning methods: random forest (RF) and variable nearest neighbor (v-NN). The two methods focus on complementary aspects of chemical mutagenicity and use different characteristics of the molecular fingerprints to achieve high levels of prediction accuracies. Thus, while RF flags mutagenic compounds using the presence or absence of small molecular fragments akin to structural alerts, the v-NN method uses molecular structural similarity as measured by fingerprint-based Tanimoto distances between molecules. We showed that the extended connectivity fingerprints could intuitively be used to define and quantify an applicability domain for either method. The importance of using applicability domains in QSAR modeling cannot be understated; compounds that are outside the applicability domain do not have any close representative in the training set, and therefore, we cannot make reliable predictions. Using either approach, we developed highly robust models that rival the performance of a state-of-the-art proprietary software package. Importantly, based on the complementary approach used by the methods, we showed that by combining the model predictions we raised the applicability domain from roughly 80% to 90%. These results indicated that the proposed QSAR protocol constituted a highly robust chemical mutagenicity prediction model.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Artificial Intelligence
  • Computer Simulation
  • Databases, Pharmaceutical
  • Models, Biological
  • Mutagens / chemistry*
  • Mutagens / toxicity*
  • Quantitative Structure-Activity Relationship*


  • Mutagens