Variability in in vivo studies: Defining the upper limit of performance for predictions of systemic effect levels

Comput Toxicol. 2020 Aug 1;15(August 2020):1-100126. doi: 10.1016/j.comtox.2020.100126.


New approach methodologies (NAMs) for chemical hazard assessment are often evaluated via comparison to animal studies; however, variability in animal study data limits NAM accuracy. The US EPA Toxicity Reference Database (ToxRefDB) enables consideration of variability in effect levels, including the lowest effect level (LEL) for a treatment-related effect and the lowest observable adverse effect level (LOAEL) defined by expert review, from subacute, subchronic, chronic, multi-generation reproductive, and developmental toxicity studies. The objectives of this work were to quantify the variance within systemic LEL and LOAEL values, defined as potency values for effects in adult or parental animals only, and to estimate the upper limit of NAM prediction accuracy. Multiple linear regression (MLR) and augmented cell means (ACM) models were used to quantify the total variance, and the fraction of variance in systemic LEL and LOAEL values explained by available study descriptors (e.g., administration route, study type). The MLR approach considered each study descriptor as an independent contributor to variance, whereas the ACM approach combined categorical descriptors into cells to define replicates. Using these approaches, total variance in systemic LEL and LOAEL values (in log10-mg/kg/day units) ranged from 0.74 to 0.92. Unexplained variance in LEL and LOAEL values, approximated by the residual mean square error (MSE), ranged from 0.20-0.39. Considering subchronic, chronic, or developmental study designs separately resulted in similar values. Based on the relationship between MSE and R-squared for goodness-of-fit, the maximal R-squared may approach 55 to 73% for a NAM-based predictive model of systemic toxicity using these data as reference. The root mean square error (RMSE) ranged from 0.47 to 0.63 log10-mg/kg/day, depending on dataset and regression approach, suggesting that a two-sided minimum prediction interval for systemic effect levels may have a width of 58 to 284-fold. These findings suggest quantitative considerations for building scientific confidence in NAM-based systemic toxicity predictions.

Keywords: ToxRefDB; in vivo data; predictive models; uncertainty; variance.