Introduction/background: Segmentation of gastrointestinal (GI) organs-at-risk (OARs) is a critical yet time-consuming step in MR-guided adaptive radiotherapy (MRgRT), with manual delineation prone to inter- and intra-observer variability. While deep learning approaches have shown promise, their clinical adoption requires not only accuracy but also interpretability and reliability. This study benchmarks two widely used convolutional architectures, U-Net and Residual U-Net (ResUNet), for abdominal OAR segmentation, with an emphasis on explainability-oriented quantitative analysis.
Methods: An anonymized abdominal MRI dataset was used to train and evaluate U-Net and ResUNet using a 5-fold stratified group cross-validation strategy. Segmentation performance was assessed using the Dice Similarity Coefficient (DSC), Intersection-over-Union (IoU), and the 95th percentile Hausdorff Distance (HD95). Explainability was investigated using Gradient-weighted Class Activation Mapping (Grad-CAM) computed from the final convolutional layer of each network. To enable objective analysis beyond qualitative visualization, Grad-CAM activation maps were quantified using numerical localization metrics relative to ground-truth organ masks, including in-organ energy ratio, boundary energy ratio, pointing accuracy, activation Dice coefficient, centroid distance and activation entropy. Grad-CAM metrics were aggregated across gastrointestinal organs and averaged over the five validation folds.
Results: Both architectures demonstrated comparable segmentation performance across organs, with no statistically significant differences across evaluated metrics. Grad-CAM analysis showed similar region-level attention patterns, with in-organ activation ratios of 71.4 ± 8.6% for U-Net and 66.2 ± 9.1% for ResUNet, boundary energy ratios of 24.1 ± 4.9% and 21.8 ± 5.2%, respectively, and pointing accuracies exceeding 70% for both models. Uncertainty analysis based on inter-fold variability and boundary error dispersion indicated comparable stability and bounded worst-case behavior.
Discussion/conclusion: By integrating performance, uncertainty and explainability quantitative indicators, this study provides an informed benchmarking of two deep learning models for abdominal OAR segmentation. The results suggest that both U-Net and ResUNet exhibit stable and interpretable behavior under the evaluated configurations, supporting their potential use in MR-guided adaptive radiotherapy workflows where reliability and clinical trust are essential.
Keywords: MR-guided radiotherapy (MRgRT); MRI segmentation; Organs-at-risk (OARs); Resunet; U-net.
Copyright © 2026. Published by Elsevier Inc.