Machine Learning Methods for Endocrine Disrupting Potential Identification Based on Single-Cell Data

Chem Eng Sci. 2023 Nov 5:281:119086. doi: 10.1016/j.ces.2023.119086. Epub 2023 Jul 18.

Abstract

Humans are continuously exposed to a variety of toxicants and chemicals which is exacerbated during and after environmental catastrophes such as floods, earthquakes, and hurricanes. The hazardous chemical mixtures generated during these events threaten the health and safety of humans and other living organisms. This necessitates the development of rapid decision-making tools to facilitate mitigating the adverse effects of exposure on the key modulators of the endocrine system, such as the estrogen receptor alpha (ERα), for example. The mechanistic stages of the estrogenic transcriptional activity can be measured with high content/high throughput microscopy-based biosensor assays at the single-cell level, which generates millions of object-based minable data points. By combining computational modeling and experimental analysis, we built a highly accurate data-driven classification framework to assess the endocrine disrupting potential of environmental compounds. The effects of these compounds on the ERα pathway are predicted as being receptor agonists or antagonists using the principal component analysis (PCA) projections of high throughput, high content image analysis descriptors. The framework also combines rigorous preprocessing steps and nonlinear machine learning algorithms, such as the Support Vector Machines and Random Forest classifiers, to develop highly accurate mathematical representations of the separation between ERα agonists and antagonists. The results show that Support Vector Machines classify the unseen chemicals correctly with more than 96% accuracy using the proposed framework, where the preprocessing and the PCA steps play a key role in suppressing experimental noise and unraveling hidden patterns in the dataset.

Keywords: Classification analysis; Endocrine disrupting chemicals; Estrogen receptor activity; High throughput microscopy; Machine learning; Predictive modeling.