Locally weighted learning methods for predicting dose-dependent toxicity with application to the human maximum recommended daily dose

Chem Res Toxicol. 2012 Oct 15;25(10):2216-26. doi: 10.1021/tx300279f. Epub 2012 Sep 26.


Toxicological experiments in animals are carried out to determine the type and severity of any potential toxic effect associated with a new lead compound. The collected data are then used to extrapolate the effects on humans and determine initial dose regimens for clinical trials. The underlying assumption is that the severity of the toxic effects in animals is correlated with that in humans. However, there is a general lack of toxic correlations across species. Thus, it is more advantageous to predict the toxicological effects of a compound on humans directly from the human toxicological data of related compounds. However, many popular quantitative structure-activity relationship (QSAR) methods that build a single global model by fitting all training data appear inappropriate for predicting toxicological effects of structurally diverse compounds because the observed toxicological effects may originate from very different and mostly unknown molecular mechanisms. In this article, we demonstrate, via application to the human maximum recommended daily dose data that locally weighted learning methods, such as k-nearest neighbors, are well suited for predicting toxicological effects of structurally diverse compounds. We also show that a significant flaw of the k-nearest neighbor method is that it always uses a constant number of nearest neighbors in making prediction for a target compound, irrespective of whether the nearest neighbors are structurally similar enough to the target compound to ensure that they share the same mechanism of action. To remedy this flaw, we proposed and implemented a variable number nearest neighbor method. The advantages of the variable number nearest neighbor method over other QSAR methods include (1) allowing more reliable predictions to be achieved by applying a tighter molecular distance threshold and (2) automatic detection for when a prediction should not be made because the compound is outside the applicable domain.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Animals
  • Artificial Intelligence
  • Dose-Response Relationship, Drug*
  • Drug-Related Side Effects and Adverse Reactions / chemically induced*
  • Humans
  • Models, Biological
  • Pharmaceutical Preparations / chemistry*
  • Quantitative Structure-Activity Relationship


  • Pharmaceutical Preparations