Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 33 (9), 933-40

Prediction of Human Population Responses to Toxic Compounds by a Collaborative Competition

Collaborators, Affiliations

Prediction of Human Population Responses to Toxic Compounds by a Collaborative Competition

Federica Eduati et al. Nat Biotechnol.

Erratum in


The ability to computationally predict the effects of toxic compounds on humans could help address the deficiencies of current chemical safety testing. Here, we report the results from a community-based DREAM challenge to predict toxicities of environmental compounds with potential adverse health effects for human populations. We measured the cytotoxicity of 156 compounds in 884 lymphoblastoid cell lines for which genotype and transcriptional data are available as part of the Tox21 1000 Genomes Project. The challenge participants developed algorithms to predict interindividual variability of toxic response from genomic profiles and population-level cytotoxicity data from structural attributes of the compounds. 179 submitted predictions were evaluated against an experimental data set to which participants were blinded. Individual cytotoxicity predictions were better than random, with modest correlations (Pearson's r < 0.28), consistent with complex trait genomic prediction. In contrast, predictions of population-level response to different compounds were higher (r < 0.66). The results highlight the possibility of predicting health risks associated with unknown compounds, although risk estimation accuracy remains suboptimal.

Conflict of interest statement

Competing financial interests

The authors declare no competing financial interests.


Figure 1
Figure 1. The NIEHS-NCATS-UNC DREAM Toxicogenetics Challenge overview
The cytotoxicity data used in the challenge consists of the estimated effective concentrations that reduced viability by 10% (i.e., the EC10) data generated for 884 lymphoblastoid cell line in response to 156 common environmental compounds. Participants were provided with a training set of cytotoxicity data for 620 cell lines and 106 compounds along with genotype data for all cell lines, RNA-seq data for 337 cell lines, and chemical attributes for all compounds. The challenge was divided in 2 independent subchallenges: in subchallenge 1, participants were asked to predict EC10 values for a separate test set of 264 cell lines in response to the 106 compounds (only 91 toxic compounds were used for final scoring); in subchallenge 2, they were asked to predict population parameters (in terms of median EC10 values and 5th to 95th interquantile distance) for a separate test set of 50 compounds.
Figure 2
Figure 2. Significance of predictions
Submissions are compared with the null hypothesis for (a, b) subchallenge 1 and (c, d) subchallenge 2. For each metric used for scoring (Pearson Correlation (a) and probabilistic C-index (b) for subchallenge 1 and Pearson Correlation (c) and Spearman Correlation (d) for subchallenge 2), performances shown for submissions are computed compound by compound and then averaged across compounds. The null hypothesis is generated for random predictions computed by random sampling, compound by compound, from the training set. In panels (e, f) performances of randomly aggregated predictions (wisdom of the crowds, in green) is compared with individual predictions (first boxplot, in red). Green boxplots represent performances distributions when 5, 10, 15, 20, and all predictions are randomly selected and aggregated. Performances are shown in terms of average Pearson Correlation computed between predicted and measured values separately for each compound. Predictions were aggregated by averaging them. In order to aggregate only independent predictions, only one submission for each team was considered as the average of all predictions submitted by the team.
Figure 3
Figure 3. Performances of predictions
Predictions were compared to the gold standard based on Pearson Correlation for (a) subchallenge 1 and (b) subchallenge 2. The heatmap in (a) illustrates performances of all predictions for all compounds used for evaluation: predictions are ranked as in the final leaderboard and compounds are clustered. Pearson Correlation values are saturated at −0.2 and 0.2. The heatmap in (b) illustrates performances of all ranked predictions for predicted median and interquantile range (q95-q05).
Figure 4
Figure 4. Advantages of using RNA-seq data
Performances of predictions for cell lines for which RNA-seq data were available were compared against performances of predictions for cell lines for which RNA-seq data were not available. Pearson Correlation and prob C-index were computed, for each compound, separately for cell lines for which RNA-seq data were and were not available, and the comparison shows that predictions for cell lines for which RNA-seq data were available are significantly better (paired t-test, p-value ≪10−10). All predictions are included in the analysis regardless of the actual use of the RNA-seq data.
Figure 5
Figure 5. Best performing method subchallenge 1 & subchallenge 2
The prediction procedure of the best performing team of subchallenge 1. (a) Workflow of prediction for sub challenge 1. (b) Heatmap of number of cell lines in each category of “genetic cluster’ (1–10, x-axis) and geographic subpopulation (y-axis). (c) Modeling workflow used by team QBRC for Toxicogenetics Challenge subchallenge 2. The model starts from deriving potential toxicity-related features by comparing response data and chemical descriptor profiles (step1) and classify compounds based on their toxicity responses (step2). Then, group-specific models are built based on group-specific chemical features and the entire training set (step3). Finally, the toxicity of a new compound is calculated as a weighted average of the predicted toxicities from each group-specific model (step4). (d) In step3, differentially distributed features and all training samples are used to develop group-specific models. (e) In step4, model applicability domain and the similarities between the new compound and the compound group are used to determine the weights for each group-specific model. Details of each step can be found in the main text.
Figure 6
Figure 6. Overview of methods and data used to solve the challenges
Overview of the input data, data reduction techniques, prediction algorithms, and model validation techniques used by participants to solve the challenge. Participants were asked to fill out a survey in order to be included in this publication as part of the NIEHS-NCATS-UNC Dream Toxicogenetics challenge consortium; only data for teams which filled out the survey are shown here. Each row corresponds to a submission and they are ordered based on the final rank for subchallenge 1 and subchallenge 2, respectively. Data are referred to 75 filled survey for subchallenge 1 (of 99 submissions) and 51 filled survey for subchallenge 2 (of 80 submissions). This corresponds to 21 (of 34) teams for subchallenge 1, and 12 (of 23) for subchallenge 2.

Similar articles

See all similar articles

Cited by 35 PubMed Central articles

See all "Cited by" articles


    1. Judson R, et al. The toxicity data landscape for environmental chemicals. Environ Health Perspect. 2009;117:685–695. - PMC - PubMed
    1. Jacobs AC, Hatfield KP. History of chronic toxicity and animal carcinogenicity studies for pharmaceuticals. Vet Pathol. 2013;50:324–333. - PubMed
    1. Zeise L, et al. Addressing human variability in next-generation human health risk assessments of environmental chemicals. Environ Health Perspect. 2013;121:23–31. - PMC - PubMed
    1. Dorne JLCM. Metabolism, variability and risk assessment. Toxicology. 2010;268:156–164. - PubMed
    1. Abdo N, et al. Population-Based in Vitro Hazard and Concentration-Response Assessment of Chemicals: The 1000 Genomes High-Throughput Screening Study. Environ Health Perspect. 2015 doi: 10.1289/ehp.1408775. - DOI - PMC - PubMed

Publication types