In silico prediction of Tetrahymena pyriformis toxicity for diverse industrial chemicals with substructure pattern recognition and machine learning methods

Chemosphere. 2011 Mar;82(11):1636-43. doi: 10.1016/j.chemosphere.2010.11.043. Epub 2010 Dec 9.

Abstract

There is an increasing need for the rapid safety assessment of chemicals by both industries and regulatory agencies throughout the world. In silico techniques are practical alternatives in the environmental hazard assessment. It is especially true to address the persistence, bioaccumulative and toxicity potentials of organic chemicals. Tetrahymena pyriformis toxicity is often used as a toxic endpoint. In this study, 1571 diverse unique chemicals were collected from the literature and composed of the largest diverse data set for T. pyriformis toxicity. Classification predictive models of T. pyriformis toxicity were developed by substructure pattern recognition and different machine learning methods, including support vector machine (SVM), C4.5 decision tree, k-nearest neighbors and random forest. The results of a 5-fold cross-validation showed that the SVM method performed better than other algorithms. The overall predictive accuracies of the SVM classification model with radial basis functions kernel was 92.2% for the 5-fold cross-validation and 92.6% for the external validation set, respectively. Furthermore, several representative substructure patterns for characterizing T. pyriformis toxicity were also identified via the information gain analysis methods.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Artificial Intelligence
  • Computer Simulation*
  • Decision Trees
  • Hazardous Substances / classification
  • Hazardous Substances / toxicity*
  • Industry
  • Logistic Models
  • Quantitative Structure-Activity Relationship
  • Risk Assessment / methods
  • Tetrahymena pyriformis / drug effects*
  • Toxicity Tests / methods*

Substances

  • Hazardous Substances