Measuring CAMD technique performance. 2. How "druglike" are drugs? Implications of Random test set selection exemplified using druglikeness classification models

J Chem Inf Model. 2007 Jan-Feb;47(1):110-4. doi: 10.1021/ci6003493.

Abstract

Research into the advancement of computer-aided molecular design (CAMD) has a tendency to focus on the discipline of algorithm development. Such efforts are often wrought to the detriment of the data set selection and analysis used in said algorithm validation. Here we highlight the potential problems this can cause in the context of druglikeness classification. More rigorous efforts are applied to the selection of decoy (nondruglike) molecules from the ACD. Comparisons are made between model performance using the standard technique of random test set creation with test sets derived from explicit ontological separation by drug class. The dangers of viewing druglike space as sufficiently coherent to permit simple classification are highlighted. In addition the issues inherent in applying unfiltered data and random test set selection to (Q)SAR models utilizing large and supposedly heterogeneous databases are discussed.

MeSH terms

  • Artificial Intelligence
  • Classification
  • Databases, Factual
  • Methods
  • Models, Molecular*
  • Pharmaceutical Preparations / chemistry*
  • Pharmaceutical Preparations / classification
  • Quantitative Structure-Activity Relationship*

Substances

  • Pharmaceutical Preparations