Improved understanding of the forces that determine drug specificity to their targets is important for drug design and discovery, as well as for gaining knowledge about molecular recognition. Here, we present a machine learning approach that includes all approved drugs with a known protein target. The drugs were characterized using easily interpretable physico-chemical descriptors. Employing the Random Forest method, we were able to predict whether a drug binds to a soluble or membrane protein with an average accuracy of 84 % and an average area under curve of 0.91. The high average performance suggests that there exist some general physico-chemical differences between drugs that bind to membrane and soluble protein targets. Variable importance measures in combination with permutation tests were used to find the most influential descriptors. This resulted in six outstanding descriptors, that all involve drug flexibility and lipophilicity, suggesting that drugs binding to membrane protein targets are in general more flexible and lipophilic, and conversely, drugs binding to soluble protein targets are more rigid and hydrophilic. With the notion that ligands in general are blueprints of their protein pockets, we may also draw general conclusions about the protein-pocket properties which may add to the understanding of molecular recognition.
Keywords: Drug selectivity; Drugs; Machine learning; Protein-ligand recognition; Targetome.
© 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.