Gamma-secretase inhibitors have been explored for the prevention and treatment of Alzheimer's disease (AD). Methods for prediction and screening of gamma-secretase inhibitors are highly desired for facilitating the design of novel therapeutic agents against AD, especially when incomplete knowledge about the mechanism and three-dimensional structure of gamma-secretase. We explored two machine learning methods, support vector machine (SVM) and random forest (RF), to develop models for predicting gamma-secretase inhibitors of diverse structures. Quantitative analysis of the receiver operating characteristic (ROC) curve was performed to further examine and optimize the models. Especially, the Youden index (YI) was initially introduced into the ROC curve of RF so as to obtain an optimal threshold of probability for prediction. The developed models were validated by an external testing set with the prediction accuracies of SVM and RF 96.48 and 98.83% for gamma-secretase inhibitors and 98.18 and 99.27% for noninhibitors, respectively. The different feature selection methods were used to extract the physicochemical features most relevant to gamma-secretase inhibition. To the best of our knowledge, the RF model developed in this work is the first model with a broad applicability domain, based on which the virtual screening of gamma-secretase inhibitors against the ZINC database was performed, resulting in 368 potential hit candidates.
2009 Wiley Periodicals, Inc.