Machine Learning Distinguishes with High Accuracy between Pan-Assay Interference Compounds That Are Promiscuous or Represent Dark Chemical Matter

Swarit Jasial; Erik Gilberg; Thomas Blaschke; Jürgen Bajorath

doi:10.1021/acs.jmedchem.8b01404

Machine Learning Distinguishes with High Accuracy between Pan-Assay Interference Compounds That Are Promiscuous or Represent Dark Chemical Matter

J Med Chem. 2018 Nov 21;61(22):10255-10264. doi: 10.1021/acs.jmedchem.8b01404. Epub 2018 Nov 13.

Authors

Swarit Jasial¹, Erik Gilberg¹, Thomas Blaschke¹, Jürgen Bajorath¹

Affiliation

¹ Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Endenicher Allee 19c , Rheinische Friedrich-Wilhelms-Universität , D-53115 Bonn , Germany.

PMID: 30422657
DOI: 10.1021/acs.jmedchem.8b01404

Abstract

Assay interference compounds give rise to false-positives and cause substantial problems in medicinal chemistry. Nearly 500 compound classes have been designated as pan-assay interference compounds (PAINS), which typically occur as substructures in other molecules. The structural environment of PAINS substructures is likely to play an important role for their potential reactivity. Given the large number of PAINS and their highly variable structural contexts, it is difficult to study context dependence on the basis of expert knowledge. Hence, we applied machine learning to predict PAINS that are promiscuous and distinguish them from others that are mostly inactive. Surprisingly accurate models can be derived using different methods such as support vector machines, random forests, or deep neural networks. Moreover, structural features that favor correct predictions have been identified, mapped, and categorized, shedding light on the structural context dependence of PAINS effects. The machine learning models presented herein further extend the capacity of PAINS filters.

MeSH terms

Computational Biology / methods*
Drug Discovery / methods*
Machine Learning*
Models, Statistical
ROC Curve