Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Apr;127(4):47002.
doi: 10.1289/EHP3986.

The Carcinogenome Project: In Vitro Gene Expression Profiling of Chemical Perturbations to Predict Long-Term Carcinogenicity

Affiliations
Free PMC article

The Carcinogenome Project: In Vitro Gene Expression Profiling of Chemical Perturbations to Predict Long-Term Carcinogenicity

Amy Li et al. Environ Health Perspect. .
Free PMC article

Abstract

Background: Most chemicals in commerce have not been evaluated for their carcinogenic potential. The de facto gold-standard approach to carcinogen testing adopts the 2-y rodent bioassay, a time-consuming and costly procedure. High-throughput in vitro assays are a promising alternative for addressing the limitations in carcinogen screening.

Objectives: We developed a screening process for predicting chemical carcinogenicity and genotoxicity and characterizing modes of actions (MoAs) using in vitro gene expression assays.

Methods: We generated a large toxicogenomics resource comprising [Formula: see text] expression profiles corresponding to 330 chemicals profiled in HepG2 (human hepatocellular carcinoma cell line) at multiple doses and replicates. Predictive models of carcinogenicity and genotoxicity were built using a random forest classifier. Differential pathway enrichment analysis was performed to identify pathways associated with carcinogen exposure. Signatures of carcinogenicity and genotoxicity were compared with external sources, including Drugmatrix and the Connectivity Map.

Results: Among profiles with sufficient bioactivity, our classifiers achieved 72.2% Area Under the ROC Curve (AUC) for predicting carcinogenicity and 82.3% AUC for predicting genotoxicity. Chemical bioactivity, as measured by the strength and reproducibility of the transcriptional response, was not significantly associated with long-term carcinogenicity in doses up to [Formula: see text]. However, sufficient bioactivity was necessary for a chemical to be used for prediction of carcinogenicity. Pathway enrichment analysis revealed pathways consistent with known pathways that drive cancer, including DNA damage and repair. The data is available at https://clue.io/CRCGN_ABC , and a portal for query and visualization of the results is accessible at https://carcinogenome.org .

Discussion: We demonstrated an in vitro screening approach using gene expression profiling to predict carcinogenicity and infer MoAs of chemical perturbations. https://doi.org/10.1289/EHP3986.

Figures

Figure 1A is a box and whisker plot with T A S values ranging between 0.0 and 0.8 at the intervals of 0.2 (y-axis) across dose rank 1 (n equals 330), 2 (n equals 330), 3 (n equals 330), 4 (n equals 330), 5 (n equals 330), and 6 (n equals 332) (x-axis). Figure 1C is a box and whisker plot with T A S values ranging between 0.0 and 0.8 at the intervals of 0.2 (y-axis) across dose rank 1 (n equals 34, 168, 128), 2 (n equals 34, 168, 128), 3 (n equals 34, 168, 128), 4 (n equals 34, 168, 128), 5 (n equals 34, 168, 128), and 6 (n equals 33, 163, 126) (x-axis) for the unknown carcinogenicity, noncarcinogenic chemicals, and noncarcinogenic chemicals. Figure 1B consists of four panels of box and whisker plots with T A S values ranging between 0.0 and 0.8 at the intervals of 0.2 (y-axis) across dose rank (x-axis). The coordinates for each of the four graphs are as follows: T A S equals 0, 0.2; T A S equals 0.2, 0.4; T A S equals 0.4, 0.6; and T A S equals 0.6, 1. Figure 1D is a scatter plot with C max values (y-axis) across mean T A S (x-axis) for the carcinogenic chemicals and noncarcinogenic chemicals in the following groups: Group 1, expected weak response based on C max; Group 2, Less potent in vitro than expected by C max; Group 3, More potent in vitro than expected by C max; and Group 4, Expected strong response based on C max.
Figure 1.
Box plot of transcriptional activity scores (TAS) by sample subsets. (A) Box plot of TAS distributions for each dose level (rank=1lowestdose; rank6=highestdose). Numeric labels indicate the significance of paired one-sided two-group TAS comparison between adjacent dose groups, adjusted for multiple comparisons across doses using the false discovery rate method (FDR) (* =FDR<0.05; *** =FDR<0.001) (see “Methods” section). (B) Box plot of TAS distribution for each dose level, binned by TAS subsets. (C) Distribution of TAS grouped by chemical carcinogenicity within each dose level. p-Values indicate the significance of unpaired one-sided two-group TAS comparison between TAS of carcinogenic chemicals and TAS of noncarcinogenic chemicals within each dose group. (D) Scatterplot of mean TAS per chemical and the ratio of equivalent in vitro dose (Cmax) over maximum in vitro dose (40μM) (see “Methods” section for Cmax calculation). Box plots in Panels A, B, and C have the following specifications: the lower, middle, upper hinges corresponding to the 25th, 50th (median), and 75th percentiles, respectively; the upper and lower whiskers extend to the smaller and largest value at most 1.5 × IQR (interquartile range) from the hinge, and data points beyond the whiskers are represented as dots.
Figure 2A is a predictive model of carcinogenicity which consists of the following: a tabular representation; a box and whisker plot showing A U C scores (y-axis) across four T A S subsets (x-axis); and a line graph plotting average TPR (y-axis) across FPRs (x-axis) for the four T A S subsets. Figure 2B is a predictive model of genotoxicity which consists of the following: a tabular representation; a box and whisker plot showing A U C scores (y-axis) across four T A S subsets (x-axis); and a line graph plotting average TPR (y-axis) across FPRs (x-axis) for the four T A S subsets.
Figure 2.
Performance of classifiers in predictive models of (A) carcinogenicity, and (B) genotoxicity. From left to right: a) Summary statistics tables of area under the ROC curve (AUC) for each transcriptional activity score (TAS) subsets; data represented are the median, mean, and SE (standard error) of the AUC scores; and b) box plots of AUC across resamples (n=25) for each TAS subset with the lower, middle, and upper hinges corresponding to the 25th, 50th (median), and 75th percentiles, respectively, the upper and lower whiskers extending to the smaller and largest value at most 1.5 × IQR (interquartile range) from the hinge, and data points beyond the whiskers represented as dots. Dotted line at 0.5 represents the expected AUC of a random classifier. Labels in each TAS group (“n=”) represent the number of unique chemicals in the model training and validation step. c) Receiver operating characteristic (ROC) curves [false positive rate (FPR) vs. average true positive rate (TPR)]. Thick lines represent vertical averaging of ROC curves across resamples in each TAS group shown with bars denoting the standard errors. Thin, semitransparent lines represent ROC curves of individual resamples in each TAS group.
Figure 3A is a horizontal bar graph plotting feature names (y-axis) across signed variable importance (mean decrease in gini; x-axis) for the carcinogens and noncarcinogens. A tabular representation with columns labeled gene symbol and gene title lies adjacent to the bar graph. Figure 3B is a horizontal bar graph plotting feature names (y-axis) across signed variable importance (mean decrease in gini; x-axis) for the genotoxicants and nongenotoxicants. A tabular representation with columns labeled gene symbol and gene title lies adjacent to the bar graph.
Figure 3.
Top 20 landmark gene features for prediction of (A) carcinogenicity, and (B) genotoxicity as ranked by variable importance (mean decrease in Gini index) in the predictive models of transcriptional activity scores (TAS) >0.4 subset.
Figure 4 comprises of two forest plots showing predicted probability of class positive (x-axis) across chemical names (y-axis) each for carcinogenicity and genotoxicity. Key is as follows: actual class is negative and positive; dose ranks from 1 to 6.
Figure 4.
Dot plot of probabilities of predicted classes for hold-out chemicals in the transcriptional activity score (TAS) >0.4 subset. Point outline colors represent actual class labels (carcinogenic vs. noncarcinogenic, genotoxic vs. nongenotoxic). Point shapes represent dose ranks (dose rank 6 represents the highest dose level for each chemical). x-Axis positions of points represent predicted probability of class “Positive” (carcinogenic in left column or genotoxic in right column), e.g., at the cutoff of 0.5 (vertical line), instances with values >0.5 are predicted “Positive,” and those with <0.5 are predicted “Negative."
Figure 5 consists of six box and whisker plots. The first three plot CMap Perturbagen Classes (y-axis) across connectivity score (x-axis) for three T A S sub sets (T A S greater than 0.2; T A S greater than 0.2; and T A S greater than 0.4) for carcinogens and noncarcinogens. The final three plot the same for genotoxicity and nongenotoxicants.
Figure 5.
Connectivity scores of top Connectivity Map (CMap) Perturbagen classes with differential connectivity [false discovery rate (FDR)<0.05] to carcinogens vs. noncarcinogens and genotoxicants vs. nongenotoxicants grouped by transcriptional activity scores (TAS) subsets. The lower, middle, and upper hinges of box plots correspond to the 25th, 50th (median), and 75th percentiles, respectively. The upper and lower whiskers extend to the smaller and largest value at most 1.5 × IQR (interquartile range) from the hinge, and data points beyond the whiskers are represented as dots.
Figures 6A and 6B are AhR activity profiles and AhR-related profiles, respectively.
Figure 6.
Investigation of profiles of aryl hydrocarbon receptor (AhR)–related chemical perturbations. (A) Profiles with AhR activity ranked by median gene set scores of AhR target gene lists. (B) AhR-related profiles clustered by connectivity scores.

Similar articles

See all similar articles

Cited by 3 articles

References

    1. Abdo KM, Eustis SL, Haseman J, Huff JE, Peters A, Persing R. 1988. Toxicity and carcinogenicity of rotenone given in the feed to F344/N rats and B6C3F1 mice for up to two years. Drug Chem Toxicol 11(3):225–235, PMID: 3181037, 10.3109/01480548809017879. - DOI - PubMed
    1. American Cancer Society. 2017. Cancer Facts & Figures. https://www.cancer.org/research/cancer-facts-statistics/all-cancer-facts-figures/cancer-facts-figures-2017.html [accessed 3 December 2018].
    1. Anand P, Kunnumakkara AB, Sundaram C, Harikumar KB, Tharakan ST, Lai OS, et al. 2008. Cancer is a preventable disease that requires major lifestyle changes. Pharm Res 25(9):2097–2116, PMID: 18626751, 10.1007/s11095-008-9661-9. - DOI - PMC - PubMed
    1. Bavetsias V, Linardopoulos S. 2015. Aurora kinase inhibitors: current status and outlook. Front Oncol 5:278, PMID: 26734566, 10.3389/fonc.2015.00278. - DOI - PMC - PubMed
    1. Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B Stat Methodol 57(1):289–300, 10.1111/j.2517-6161.1995.tb02031.x. - DOI

Publication types

LinkOut - more resources

Feedback