Measuring CAMD technique performance. 2. How "druglike" are drugs? Implications of Random test set selection exemplified using druglikeness classification models

Andrew C Good; Mark A Hermsmeier

doi:10.1021/ci6003493

Measuring CAMD technique performance. 2. How "druglike" are drugs? Implications of Random test set selection exemplified using druglikeness classification models

J Chem Inf Model. 2007 Jan-Feb;47(1):110-4. doi: 10.1021/ci6003493.

Authors

Andrew C Good¹, Mark A Hermsmeier

Affiliation

¹ Bristol-Myers Squibb, 5 Research Parkway, Wallingford, Connecticut 06492, USA. andrew.good@bms.com

PMID: 17238255
DOI: 10.1021/ci6003493

Abstract

Research into the advancement of computer-aided molecular design (CAMD) has a tendency to focus on the discipline of algorithm development. Such efforts are often wrought to the detriment of the data set selection and analysis used in said algorithm validation. Here we highlight the potential problems this can cause in the context of druglikeness classification. More rigorous efforts are applied to the selection of decoy (nondruglike) molecules from the ACD. Comparisons are made between model performance using the standard technique of random test set creation with test sets derived from explicit ontological separation by drug class. The dangers of viewing druglike space as sufficiently coherent to permit simple classification are highlighted. In addition the issues inherent in applying unfiltered data and random test set selection to (Q)SAR models utilizing large and supposedly heterogeneous databases are discussed.

MeSH terms

Artificial Intelligence
Classification
Databases, Factual
Methods
Models, Molecular*
Pharmaceutical Preparations / chemistry*
Pharmaceutical Preparations / classification
Quantitative Structure-Activity Relationship*

Substances

Pharmaceutical Preparations