A hybrid mixture discriminant analysis-random forest computational model for the prediction of volume of distribution of drugs in human

J Med Chem. 2006 Apr 6;49(7):2262-7. doi: 10.1021/jm050200r.


A computational approach is described that can predict the VD(ss) of new compounds in humans, with an accuracy of within 2-fold of the actual value. A dataset of VD values for 384 drugs in humans was used to train a hybrid mixture discriminant analysis-random forest (MDA-RF) model using 31 computed descriptors. Descriptors included terms describing lipophilicity, ionization, molecular volume, and various molecular fragments. For a test set of 23 proprietary compounds not used in model construction, the geometric mean fold-error (GMFE) was 1.78-fold (+/-11.4%). The model was also tested using a leave-class out approach wherein subsets of drugs based on therapeutic class were removed from the training set of 384, the model was recast, and the VD(ss) values for each of the subsets were predicted. GMFE values ranged from 1.46 to 2.94-fold, depending on the subset. Finally, for an additional set of 74 compounds, VD(ss) predictions made using the computational model were compared to predictions made using previously described methods dependent on animal pharmacokinetic data. Computational VD(ss) predictions were, on average, 2.13-fold different from the VD(ss) predictions from animal data. The computational model described can predict human VD(ss) with an accuracy comparable to predictions requiring substantially greater effort and can be applied in place of animal experimentation.

MeSH terms

  • Algorithms
  • Computer Simulation
  • Drug Design
  • Humans
  • Models, Biological*
  • Pharmaceutical Preparations / metabolism*
  • Pharmacokinetics*
  • Tissue Distribution


  • Pharmaceutical Preparations