Deep neural nets as a method for quantitative structure-activity relationships

Junshui Ma; Robert P Sheridan; Andy Liaw; George E Dahl; Vladimir Svetnik

doi:10.1021/ci500747n

Deep neural nets as a method for quantitative structure-activity relationships

J Chem Inf Model. 2015 Feb 23;55(2):263-74. doi: 10.1021/ci500747n. Epub 2015 Feb 17.

Authors

Junshui Ma¹, Robert P Sheridan, Andy Liaw, George E Dahl, Vladimir Svetnik

Affiliation

¹ Biometrics Research Department and ‡Structural Chemistry Department, Merck Research Laboratories , Rahway, New Jersey 07065, United States.

PMID: 25635324
DOI: 10.1021/ci500747n

Abstract

Neural networks were widely used for quantitative structure-activity relationships (QSAR) in the 1990s. Because of various practical issues (e.g., slow on large problems, difficult to train, prone to overfitting, etc.), they were superseded by more robust methods like support vector machine (SVM) and random forest (RF), which arose in the early 2000s. The last 10 years has witnessed a revival of neural networks in the machine learning community thanks to new methods for preventing overfitting, more efficient training algorithms, and advancements in computer hardware. In particular, deep neural nets (DNNs), i.e. neural nets with more than one hidden layer, have found great successes in many applications, such as computer vision and natural language processing. Here we show that DNNs can routinely make better prospective predictions than RF on a set of large diverse QSAR data sets that are taken from Merck's drug discovery effort. The number of adjustable parameters needed for DNNs is fairly large, but our results show that it is not necessary to optimize them for individual data sets, and a single set of recommended parameters can achieve better performance than RF for most of the data sets we studied. The usefulness of the parameters is demonstrated on additional data sets not used in the calibration. Although training DNNs is still computationally intensive, using graphical processing units (GPUs) can make this issue manageable.

MeSH terms

Algorithms
Drug Discovery
Machine Learning
Neural Networks, Computer*
Prospective Studies
Quantitative Structure-Activity Relationship*
Support Vector Machine
Workflow