Comparing LD50/LC50 Machine Learning Models for Multiple Species

J Chem Health Saf. 2023 Mar 27;30(2):83-97. doi: 10.1021/acs.chas.2c00088. Epub 2023 Feb 23.

Abstract

The lethal dose or concentration which kills 50% of the animals (LD50 or LC50) is an important parameter for scientists to understand the toxicity of chemicals in different scenarios that can be used to make go-no-go decisions, and ultimately assist in the choice of the right personal protective equipment needed for containment. The LD50 assessment process has also required the use of many animals although modern methods have reduced the number of rats needed. Since a compound is usually considered highly toxic when the LD50 is lower than 25 mg/kg, such a classification provides potentially valuable safety information to synthetic chemists and other safety assessment scientists. The need for finding alternative approaches such as computational methods is important to ultimately reduce animal use for this testing further still. We now summarize our efforts to use public data for building in vivo LD50 or LC50 classification and regression machine learning models for various species (rat, mouse, fish and daphnia) and their 5-fold cross validation statistics with different machine learning algorithms as well as an external curated test set for mouse LD50. These datasets consist of different molecule classes, may cover different activity ranges, and also have a range of dataset sizes. The challenges of using such computational models are that their applicability domain will also need to be understood so that they can be used to make reliable predictions for novel molecules. These machine learning models will also need to be backed up with experimental validation. However, such models could also be used for efforts to bridge gaps in individual toxicity datasets. Making such models available also opens them up to potential misuse or dual use. We will summarize these efforts and propose that they could be used for scoring the millions of commercially available molecules, most of which likely do not have a known LD50 or for that matter any data in vitro or in vivo for toxicity.

Keywords: Acute toxicity; Classification; Dual use; LD50; Machine learning; Regression; in silico predictions.