A data-driven weighting scheme for multivariate phenotypic endpoints recapitulates zebrafish developmental cascades

Toxicol Appl Pharmacol. 2017 Jan 1;314:109-117. doi: 10.1016/j.taap.2016.11.010. Epub 2016 Nov 22.


Zebrafish have become a key alternative model for studying health effects of environmental stressors, partly due to their genetic similarity to humans, fast generation time, and the efficiency of generating high-dimensional systematic data. Studies aiming to characterize adverse health effects in zebrafish typically include several phenotypic measurements (endpoints). While there is a solid biomedical basis for capturing a comprehensive set of endpoints, making summary judgments regarding health effects requires thoughtful integration across endpoints. Here, we introduce a Bayesian method to quantify the informativeness of 17 distinct zebrafish endpoints as a data-driven weighting scheme for a multi-endpoint summary measure, called weighted Aggregate Entropy (wAggE). We implement wAggE using high-throughput screening (HTS) data from zebrafish exposed to five concentrations of all 1060 ToxCast chemicals. Our results show that our empirical weighting scheme provides better performance in terms of the Receiver Operating Characteristic (ROC) curve for identifying significant morphological effects and improves robustness over traditional curve-fitting approaches. From a biological perspective, our results suggest that developmental cascade effects triggered by chemical exposure can be recapitulated by analyzing the relationships among endpoints. Thus, wAggE offers a powerful approach for analysis of multivariate phenotypes that can reveal underlying etiological processes.

Keywords: Bayesian; Developmental cascade; High-dimensional; Multiple endpoints; Multivariate; Risk assessment; Scoring; ToxRefDB; Zebrafish.

MeSH terms

  • Animals
  • Models, Theoretical
  • Multivariate Analysis
  • Phenotype
  • Zebrafish / embryology*