Predictivity of Simulated ADME AutoQSAR Models over Time

Mol Inform. 2011 Mar 14;30(2-3):256-66. doi: 10.1002/minf.201000160. Epub 2011 Mar 17.

Abstract

The automation of model building and model updating (autoQSAR) is an important step forward towards real-time small molecule drug discovery project support using the latest experimental data. We present here a simulation study using real company data of the behaviour of QSAR models over time. Three different global QSAR models, namely, human plasma protein binding, aqueous solubility and log D7.4 , are updated on a monthly basis over a period of three years. The effect of updating the models on their predictivity is studied using a series of monthly temporal test sets in addition to a final terminal temporal test set. Partial Least Squares (PLS), Random Forest (RF) and Bayesian Neural Networks (BNN) models are examined, covering three distinctly different approaches to QSAR modelling. It is demonstrated that the models are able to predict forward in time, but that updating models on a regular basis increases their ability to make predictions for current compounds. The degree of the improvement depends on the property studied and the model building technique used. These results demonstrate the importance of updating models on a regular basis. For both static models predicting forward in time, and regularly updating models it is shown that RF models are the most predictive for these data sets.

Keywords: ADME; ADMET; BNN; PLS; Quantitative structure-activity relationships (QSAR); Quantitative structure-property relationships (QSPR); Random forest (RF); autoQSAR.