A significant challenge in high-throughput screening (HTS) campaigns is the identification of assay technology interference compounds. A Compound Interfering with an Assay Technology (CIAT) gives false readouts in many assays. CIATs are often considered viable hits and investigated in follow-up studies, thus impeding research and wasting resources. In this study, we developed a machine-learning (ML) model to predict CIATs for three assay technologies. The model was trained on known CIATs and non-CIATs (NCIATs) identified in artefact assays and described by their 2D structural descriptors. Usual methods identifying CIATs are based on statistical analysis of historical primary screening data and do not consider experimental assays identifying CIATs. Our results show successful prediction of CIATs for existing and novel compounds and provide a complementary and wider set of predicted CIATs compared to BSF, a published structure-independent model, and to the PAINS substructural filters. Our analysis is an example of how well-curated datasets can provide powerful predictive models despite their relatively small size.
Keywords: assay interference; computational chemistry; frequent hitters; high-throughput screening; machine learning.
© 2019 The Authors. Published by Wiley-VCH Verlag GmbH & Co. KGaA.