Prediction of human intestinal absorption by GA feature selection and support vector machine regression

Int J Mol Sci. 2008 Oct;9(10):1961-76. doi: 10.3390/ijms9101961. Epub 2008 Oct 20.


QSAR (Quantitative Structure Activity Relationships) models for the prediction of human intestinal absorption (HIA) were built with molecular descriptors calculated by ADRIANA.Code, Cerius(2) and a combination of them. A dataset of 552 compounds covering a wide range of current drugs with experimental HIA values was investigated. A Genetic Algorithm feature selection method was applied to select proper descriptors. A Kohonen's self-organizing Neural Network (KohNN) map was used to split the whole dataset into a training set including 380 compounds and a test set consisting of 172 compounds. First, the six selected descriptors from ADRIANA.Code and the six selected descriptors from Cerius(2) were used as the input descriptors for building quantitative models using Partial Least Square (PLS) analysis and Support Vector Machine (SVM) Regression. Then, another two models were built based on nine descriptors selected by a combination of ADRIANA.Code and Cerius(2) descriptors using PLS and SVM, respectively. For the three SVM models, correlation coefficients (r) of 0.87, 0.89 and 0.88 were achieved; and standard deviations (s) of 10.98, 9.72 and 9.14 were obtained for the test set.

Keywords: Genetic Algorithm Feature Selection; Human intestinal absorption (HIA); Kohonen’s self-organizing Neural Network (KohNN); Quantitative Structure Activity Relationships (QSAR); Support Vector Machine (SVM).