Background: Multi-label medical image classification is challenging due to complex inter-label dependencies, data imbalance, and the need to integrate multiple data modalities. These challenges hinder the development of robust and interpretable diagnostic systems capable of leveraging diverse clinical information.
Method: We propose a cancer risk stratification framework that combines univariate thresholding with multivariate modeling using a hybrid parallel deep learning architecture, MedFusionNet. First, univariate thresholds are applied to identify the top-N discriminative features for each label. These selected features are then incorporated into MedFusionNet, which integrates Self-Attention Mechanisms, Dense Connections, and Feature Pyramid Networks (FPNs). The architecture is further extended for multi-modal learning by fusing image data with corresponding textual and clinical metadata. Self-Attention captures dependencies across image regions, labels, and modalities; Dense Connections enable efficient feature propagation; and FPNs support multi-scale representation and cross-modal fusion.
Results: Extensive evaluations on multiple datasets, including NIH ChestX-ray14 and a custom cervical cancer dataset, confirm that MedFusionNet consistently outperforms existing models. The framework delivers higher accuracy, improved robustness, and enhanced interpretability compared to traditional deep learning approaches.
Conclusions: MedFusionNet provides an effective and scalable solution for multi-label medical image classification and cancer risk stratification. By integrating multi-modal information and advanced architectural components, it improves predictive performance while maintaining high interpretability, making it well-suited for real-world clinical applications.
Medical images play an important role in helping doctors assess a person’s risk of developing cancer. However, these images can be challenging for computer systems to interpret, especially when several findings appear together, or when pieces of clinical information (such as patient history or handwritten notes) are stored separately. We developed a more reliable computational approach that first identifies the most important clinical features linked to cancer risk. These features are then analyzed using a model called MedFusionNet, which brings together information from medical images, clinical data, and text. This combined approach helps the system recognize patterns that might be overlooked when each type of information is considered on its own. When evaluated on large public datasets and a separate clinical dataset, MedFusionNet showed more accurate and consistent results than other commonly used techniques. These improvements may support earlier detection of cancers, reduce uncertainty in cancer diagnosis, and help clinicians make clearer and more informed decisions.
© 2026. The Author(s).