Predicting the hepatocarcinogenic potential of alkenylbenzene flavoring agents using toxicogenomics and machine learning

Toxicol Appl Pharmacol. 2010 Mar 15;243(3):300-14. doi: 10.1016/j.taap.2009.11.021. Epub 2009 Dec 11.


Identification of carcinogenic activity is the primary goal of the 2-year bioassay. The expense of these studies limits the number of chemicals that can be studied and therefore chemicals need to be prioritized based on a variety of parameters. We have developed an ensemble of support vector machine classification models based on male F344 rat liver gene expression following 2, 14 or 90 days of exposure to a collection of hepatocarcinogens (aflatoxin B1, 1-amino-2,4-dibromoanthraquinone, N-nitrosodimethylamine, methyleugenol) and non-hepatocarcinogens (acetaminophen, ascorbic acid, tryptophan). Seven models were generated based on individual exposure durations (2, 14 or 90 days) or a combination of exposures (2+14, 2+90, 14+90 and 2+14+90 days). All sets of data, with the exception of one yielded models with 0% cross-validation error. Independent validation of the models was performed using expression data from the liver of rats exposed at 2 dose levels to a collection of alkenylbenzene flavoring agents. Depending on the model used and the exposure duration of the test data, independent validation error rates ranged from 47% to 10%. The variable with the most notable effect on independent validation accuracy was exposure duration of the alkenylbenzene test data. All models generally exhibited improved performance as the exposure duration of the alkenylbenzene data increased. The models differentiated between hepatocarcinogenic (estragole and safrole) and non-hepatocarcinogenic (anethole, eugenol and isoeugenol) alkenylbenzenes previously studied in a carcinogenicity bioassay. In the case of safrole the models correctly differentiated between carcinogenic and non-carcinogenic dose levels. The models predict that two alkenylbenzenes not previously assessed in a carcinogenicity bioassay, myristicin and isosafrole, would be weakly hepatocarcinogenic if studied at a dose level of 2 mmol/kg bw/day for 2 years in male F344 rats; therefore suggesting that these chemicals should be a higher priority relative to other untested alkenylbenzenes for evaluation in the carcinogenicity bioassay. The results of the study indicate that gene expression-based predictive models are an effective tool for identifying hepatocarcinogens. Furthermore, we find that exposure duration is a critical variable in the success or failure of such an approach, particularly when evaluating chemicals with unknown carcinogenic potency.

Publication types

  • Research Support, N.I.H., Intramural
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Animals
  • Artificial Intelligence*
  • Benzene Derivatives / toxicity*
  • Blood Cell Count
  • Blood Chemical Analysis
  • Carcinogens / toxicity
  • Cluster Analysis
  • Dose-Response Relationship, Drug
  • Flavoring Agents / toxicity*
  • Food Additives / toxicity
  • Gene Expression / drug effects
  • Genome-Wide Association Study
  • Liver / metabolism
  • Liver Neoplasms / chemically induced*
  • Liver Neoplasms / genetics
  • Male
  • Mutagenicity Tests
  • Oligonucleotide Array Sequence Analysis
  • RNA / biosynthesis
  • RNA / isolation & purification
  • Rats
  • Rats, Inbred F344
  • Reproducibility of Results
  • Toxicogenetics / methods*


  • Benzene Derivatives
  • Carcinogens
  • Flavoring Agents
  • Food Additives
  • RNA