Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jul;34(7):731-746.
doi: 10.1007/s10822-020-00310-4. Epub 2020 Apr 16.

Revealing cytotoxic substructures in molecules using deep learning

Affiliations

Revealing cytotoxic substructures in molecules using deep learning

Henry E Webel et al. J Comput Aided Mol Des. 2020 Jul.

Abstract

In drug development, late stage toxicity issues of a compound are the main cause of failure in clinical trials. In silico methods are therefore of high importance to guide the early design process to reduce time, costs and animal testing. Technical advances and the ever growing amount of available toxicity data enabled machine learning, especially neural networks, to impact the field of predictive toxicology. In this study, cytotoxicity prediction, one of the earliest handles in drug discovery, is investigated using a deep learning approach trained on a highly consistent in-house data set of over 34,000 compounds with a share of less than 5% of cytotoxic molecules. The model reached a balanced accuracy of over 70%, similar to previously reported studies using Random Forest. Albeit yielding good results, neural networks are often described as a black box lacking deeper mechanistic understanding of the underlying model. To overcome this absence of interpretability, a Deep Taylor Decomposition method is investigated to identify substructures that may be responsible for the cytotoxic effects, the so-called toxicophores. Furthermore, this study introduces cytotoxicity maps which provide a visual structural interpretation of the relevance of these substructures. Using this approach could be helpful in drug development to predict the potential toxicity of a compound as well as to generate new insights into the toxic mechanism. Moreover, it could also help to de-risk and optimize compounds.

Keywords: Cytotoxic substructures; Deep Neural Networks; Deep Taylor Decomposition; Toxicophores.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflict of interest.

Figures

Fig. 1
Fig. 1
The logarithmic scale plot shows the number of toxic and non-toxic molecules for the two cell lines HEK293 and HepG2. There are approximately 20 times more molecules that are labeled non-toxic than toxic, making the data set highly imbalanced
Fig. 2
Fig. 2
The Deep Taylor Decomposition method applied to a three hidden layer feedforward neural network. The inputs to the network are 2048 fingerprint bits. The left diagram represents the network with ReLU activation function and the right diagram the assigned relevances using the z+ rule. xil,Ril represent the ith node, relevance at layer l, respectively
Fig. 3
Fig. 3
Workflow for identifying potential toxicophores. The first arrow describes the transformation from the molecules in the training and validation sets into 2048 long binary vector describing the Morgan fingerprints of radius 2, using RDKit. Each bit represents one (or more) atom environment(s). The black box indicates if the corresponding atom environment is present in the molecule. The second arrow shows that relevance scores can be obtained for each compound using the Deep Taylor Decomposition method described in the “Deep Taylor Decomposition” section and illustrated in Fig. 2. Once all relevance scores are computed for each decomposable molecule, they are averaged using Eq. 5. The bits corresponding to the k-highest global mean relevance scores are stored and used for further analysis as potential toxicophores
Fig. 4
Fig. 4
a Distribution of predicted scores for molecules from the validation set, which was used to calibrate the cutoff of 0.17 (indicated by the vertical line) of the model to classify compounds as cytotoxic. b Distribution of global mean relevances of set bits in decomposable compounds in the training and validation set, which were used to determine the five most important bits (indicated by the vertical line)
Fig. 5
Fig. 5
The figure shows three compounds from the test set, namely molecule 1, molecule 2B and molecule 3A, that were correctly labeled cytotoxic by the FNN model. a highlights bit 713 in red in molecule 1. bd illustrate the cytotoxicity maps for these molecules. The atomic weights are computed using the approach discussed in the “Identification of toxicopohores and visualization as cytotoxicity maps” section. The higher the value of the respective global mean relevance, the darker the green coloring
Fig. 6
Fig. 6
Schematic description of analysis: On the left, molecules 2A-2E from the test set are shown together with the relevant bits highlighted in red. The common core of these five molecules is used as query for the eMolTox server and the results of eMolTox are summarized on the right, with predicted toxic endpoints in blue

Similar articles

Cited by

References

    1. CAS. CAS REGISTRY. https://www.cas.org/support/documentation/chemical-substances
    1. Hartung T. Making big sense from big data in toxicology by read-across. ALTEX-Altern Anim Exp. 2016;33(2):83–93. doi: 10.14573/altex.1603091. - DOI - PubMed
    1. Waring MJ, Arrowsmith J, Leach AR, Leeson PD, Mandrell S, Owen RM, Pairaudeau G, Pennie WD, Pickett SD, Wang J, et al. An analysis of the attrition of drug candidates from four major pharmaceutical companies. Nat Rev Drug Discov. 2015;14(7):475. doi: 10.1038/nrd4609. - DOI - PubMed
    1. McKim JM. Building a tiered approach to in vitro predictive toxicity screening: a focus on assays with in vivo relevance. Combinatorial Chem High Throughput screen. 2010;13(2):188–206. doi: 10.2174/138620710790596736. - DOI - PMC - PubMed
    1. BMEL - Übersicht: BMEL informiert über Tierschutz - Verwendung von Versuchstieren im Jahr 2016. https://www.bmel.de/DE/Tier/Tierschutz/_texte/Versuchstierzahlen2016.htm...

Publication types

LinkOut - more resources