EPIDSeg-Net: A Multi-Modal Fusion Framework Based on DRR Guidance in Radiotherapy is Used for Precise Segmentation of MV-EPID Lung Targets

Technol Cancer Res Treat. 2026 Jan-Dec:25:15330338251414224. doi: 10.1177/15330338251414224. Epub 2026 Feb 3.

Abstract

BackgroundBy integrating Digitally Reconstructed Radiograph (DRR) images of pulmonary tumors with Electronic Portal Imaging Device (EPID) images to assist in target segmentation, and subsequently comparing morphological changes in segmented targets across different radiotherapy stages, this approach enables precise quantification of dynamic variations in target volume and shape. This methodological integration provides objective evidence for treatment response evaluation and dynamic optimization of treatment plans, thereby significantly enhancing the precision of radiotherapy delivery.MethodsThe proposed multimodal segmentation framework, named EPIDSeg-Net, comprises an encoder, a multi-scale feature layer, and a decoder. The encoder utilizes a dual-branch architecture: a CNN branch for extracting local texture features and a Swin-Transformer branch for capturing global semantic features. The model first calibrates multimodal input features through a Dual Attention Mechanism (DAM) to adaptively adjust modality-specific weights, thereby enhancing tolerance to missing image information in multi-sequence segmentation. Subsequently, two key modules are implemented within the multi-scale feature layer: a Large-Kernel Grouped Attention Gating (LKG-Gate) module to strengthen local contextual awareness, and a Multi-Path Feature Extraction (MPFE) module to improve feature robustness via a parallel structure. These designs enable the model to effectively focus on lung tumor target regions, optimize segmentation accuracy, and achieve high-performance reconstruction.ResultsThe framework effectively integrates multimodal features, enabling high-precision localization and sharp boundary delineation while preserving anatomical details. Quantitative evaluations demonstrate superior performance: DICE = 93.2 (92.4∼93.9), CE = 0.352, HD95 = 9.42 (6.03∼12.8), IOU = 86.0 (84.1∼87.9), and SENCE = 0.828. Overall, the model excels at preserving gradient information, regional integrity, and fine details; effectively suppresses feature loss; and reduces missed segmentation rates, leading to improvements in both subjective and objective performance metrics.ConclusionThe proposed segmentation method effectively integrates information from EPID and DRR images, enabling more precise localization and segmentation of lesion regions within EPID images while enhancing segmentation accuracy.

Keywords: medical image processing; multi-scale features; multimodal image segmentation; tumor target region.

MeSH terms

  • Algorithms
  • Humans
  • Image Processing, Computer-Assisted* / methods
  • Lung Neoplasms* / diagnostic imaging
  • Lung Neoplasms* / pathology
  • Lung Neoplasms* / radiotherapy
  • Radiotherapy Planning, Computer-Assisted* / methods
  • Radiotherapy, Image-Guided* / methods
  • Tomography, X-Ray Computed / methods