Accurate identification of carcinogenic hazards is essential for public health protection, yet traditional animal-based assays are time-consuming, expensive, and ethically challenging. Many existing quantitative structure-activity relationship (QSAR) models predict overall carcinogenicity but often lack the organ-level specificity crucial for drug development and risk assessment. To fill this gap, we curated a high-quality dataset of 945 compounds based on mammalian carcinogenicity tests covering the endocrine, exocrine, hepatobiliary, respiratory, and urinary systems, and built machine learning-driven QSAR models integrating Tox21 bioactivity endpoints, descriptors (RDKit), and fingerprints (ECFP6, FCFP6, and MACCS) to capture mechanistic and structural drivers of organ-specific carcinogenic potential. Top performers, including CatBoost and neural networks, were trained using RDKit descriptors and combined descriptor-fingerprint feature sets, showing acceptable to good predictive ability (F1 = 0.68-0.88 and AUC = 0.64-0.83). Feature importance analyses revealed that binary substructure fingerprints drive endocrine and respiratory predictions, while quantitative physicochemical descriptors dominate hepatobiliary and urinary models. Tox21 bioactivity endpoints, particularly CYP450 inhibition assays, ranked highly for exocrine carcinogenicity predictions, aligning with their role in xenobiotic metabolism. The top-performing models are accessible via a web dashboard, offering a rapid screening tool to prioritize chemicals for targeted in-depth evaluation and marking a significant advance in organ-specific carcinogenicity prediction.
Keywords: Carcinogenicity; Feature importance; Machine learning; Organ system; Quantitative structure−activity relationship (QSAR); Tox21.
Copyright © 2025 Elsevier Ltd. All rights reserved.