Comparative chemometric modeling of cytochrome 3A4 inhibitory activity of structurally diverse compounds using stepwise MLR, FA-MLR, PLS, GFA, G/PLS and ANN techniques

Eur J Med Chem. 2009 Jul;44(7):2913-22. doi: 10.1016/j.ejmech.2008.12.004. Epub 2008 Dec 16.


Twenty-eight structurally diverse cytochrome 3A4 (CYP3A4) inhibitors have been subjected to quantitative structure-activity relationship (QSAR) studies. The analyses were performed with electronic, spatial, topological, and thermodynamic descriptors calculated using Cerius 2 version 10 software. The statistical tools used were linear [multiple linear regression with factor analysis as preprocessing step (FA-MLR), stepwise MLR, partial least squares (PLS), genetic function algorithm (GFA), genetic PLS (G/PLS)] and non-linear methods [artificial neural network (ANN)]. All the five linear modeling methods indicate the importance of n-octanol/water partition coefficient (logP) along with different topological and electronic parameters. The best model obtained from the training set (stepwise regression) based on highest external predictive R(2) value and lowest RMSEP value also showed good internal predictive power. Other models like FA-MLR, PLS, GFA and G/PLS are also of statistically significant internal and external validation characteristics. The best model [according to r(m)(2) for the test set, as defined by P.P. Roy, K. Roy, QSAR Comb. Sci. 27 (2008) 302-313] obtained from ANN showed a good r(2) value (determination coefficient between observed and predicted values) for the test set compounds, which was superior to those of other statistical models except the stepwise regression derived model. However, based upon the r(m)(2) value (test set), which penalizes a model for large differences between observed and predicted values, the stepwise MLR model was found to be inferior to other methods except PLS. Considering r(m)(2) value for the whole set, the G/PLS derived model appears to be the best predictive model for this data set. For choosing the best predictive model from among comparable models, r(m)(2) for the whole set calculated based on leave-one-out predicted values of the training set and model-derived predicted values for the test set compounds is suggested to be a good criterion.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Cytochrome P-450 CYP3A / metabolism
  • Cytochrome P-450 CYP3A Inhibitors*
  • Enzyme Inhibitors / chemistry*
  • Enzyme Inhibitors / metabolism
  • Enzyme Inhibitors / pharmacology*
  • Factor Analysis, Statistical
  • Least-Squares Analysis
  • Linear Models
  • Models, Molecular*
  • Neural Networks, Computer
  • Quantitative Structure-Activity Relationship
  • Reproducibility of Results


  • Cytochrome P-450 CYP3A Inhibitors
  • Enzyme Inhibitors
  • Cytochrome P-450 CYP3A
  • CYP3A4 protein, human