Classification of cytochrome p(450) activities using machine learning methods

Mol Pharm. 2009 Nov-Dec;6(6):1920-6. doi: 10.1021/mp900217x.


The cytochrome P(450) (CYP) system plays an integral part in the metabolism of drugs and other xenobiotics. Knowledge of the structural features required for interaction with any of the different isoforms of the CYP system is therefore immensely valuable in early drug discovery. In this paper, we focus on three major isoforms (CYP 1A2, CYP 2D6, and CYP 3A4) and present a data set of 335 structurally diverse drug compounds classified for their interaction (as substrate, inhibitor, or any interaction) with these isoforms. We also present machine learning models using a variety of commonly used methods (k-nearest neighbors, decision tree induction using the CHAID and CRT algorithms, random forests, artificial neural networks, and support vector machines using the radial basis function (RBF) and homogeneous polynomials as kernel functions). We discuss the physicochemical features relevant for each end point and compare it to similar studies. Many of these models perform exceptionally well, even with 10-fold cross-validation, yielding corrected classification rates of 81.7 to 91.9% for CYP 1A2, 89.2 to 92.9% for CYP 2D6, and 87.4 to 89.9% for CYP3A4. Our models help in understanding the structural requirements for CYP interactions and can serve as sensitive tools in virtual screenings and lead optimization for toxicological profiles in drug discovery.

MeSH terms

  • Algorithms
  • Artificial Intelligence*
  • Cytochrome P-450 CYP1A2
  • Cytochrome P-450 CYP2D6
  • Cytochrome P-450 Enzyme System / chemistry
  • Cytochrome P-450 Enzyme System / metabolism*
  • Models, Molecular
  • Quantitative Structure-Activity Relationship


  • Cytochrome P-450 Enzyme System
  • Cytochrome P-450 CYP1A2
  • Cytochrome P-450 CYP2D6