Linking electron ionization mass spectra of organic chemicals to toxicity endpoints through machine learning and experimentation

J Hazard Mater. 2022 Jun 5:431:128558. doi: 10.1016/j.jhazmat.2022.128558. Epub 2022 Feb 23.

Abstract

Quantitative structure-activity relationship (QSAR) modeling has been widely used to predict the potential harm of chemicals, in which the prediction heavily relies on the accurate annotation of chemical structures. However, it is difficult to determine the accurate structure of an unknown compound in many cases, such as in complex water environments. Here, we solved the above problem by linking electron ionization mass spectra (EI-MS) of organic chemicals to toxicity endpoints through various machine learning methods. The proposed method was verified by predicting 50% growth inhibition of Tetrahymena pyriformis (T. pyriformis) and liver toxicity. The optimal model performance obtained an R2 > 0.7 or balanced accuracy > 0.72 for both the training set and test set. External experimentation further verified the application potential of our proposed method in the toxicity prediction of unknown chemicals. Feature importance analysis allowed us to identify critical spectral features that were responsible for chemical-induced toxicity. Our approach has the potential for toxicity prediction in such fields that it is difficult to determine accurate chemical structures.

Keywords: Chemical structure identification; Chemical toxicity prediction; Environmental health and safety; Machine learning; Mass spectra.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Electrons*
  • Machine Learning
  • Organic Chemicals / toxicity
  • Quantitative Structure-Activity Relationship
  • Tetrahymena pyriformis*

Substances

  • Organic Chemicals