HYG-mol: An Interpretable Multimodal Hypergraph Framework for Molecular Property Prediction

Comput Struct Biotechnol J. 2026 Apr 9;35(1):0036. doi: 10.34133/csbj.0036. eCollection 2026.

Abstract

Accurate molecular property prediction is a central task in drug discovery, yet existing methods often struggle to simultaneously capture higher-order molecular structures and chemically grounded semantic information. Graph neural networks are limited to pairwise atomic interactions, while Simplified Molecular Input Line Entry System-based language models lack explicit structural grounding, leading to incomplete structure-property representations. In this work, we propose HYG-mol, an interpretable molecular property prediction framework that integrates hypergraph-based structural modeling with multimodal chemical semantics. Molecules are represented as hypergraphs in which chemically meaningful substructures, such as functional groups and ring systems, are explicitly encoded as hyperedges, enabling direct modeling of higher-order structural dependencies. Chemical semantic information derived from a pretrained language model is fused with physicochemical descriptors at the atomic level. A hypergraph attention network is employed to capture cross-scale interactions and to identify substructures relevant to the prediction task. Extensive evaluations on MoleculeNet benchmark datasets demonstrate that HYG-mol consistently outperforms state-of-the-art baseline methods across both classification and regression tasks. Ablation and interpretability analyses further validate the effectiveness of the proposed representation and reveal strong correspondence between model-identified substructures and chemically meaningful motifs. Overall, HYG-mol provides a unified and interpretable framework for molecular property prediction by explicitly grounding chemical semantics in higher-order structural representations.