Light mixed-supervised segmentation for 3D medical image data

Med Phys. 2024 Jan;51(1):167-178. doi: 10.1002/mp.16816. Epub 2023 Nov 1.

Abstract

Background: Accurate 3D semantic segmentation models are essential for many clinical applications. To train a model for 3D segmentation, voxel-level annotation is necessary, which is expensive to obtain due to laborious work and privacy protection. To accurately annotate 3D medical data, such as MRI, a common practice is to annotate the volumetric data in a slice-by-slice contouring way along principal axes.

Purpose: In order to reduce the annotation effort in slices, weakly supervised learning with a bounding box (Bbox) was proposed to leverage the discriminating information via a tightness prior assumption. Nevertheless, this method requests accurate and tight Bboxes, which will significantly drop the performance when tightness is not held, that is when a relaxed Bbox is applied. Therefore, there is a need to train a stable model based on relaxed Bbox annotation.

Methods: This paper presents a mixed-supervised training strategy to reduce the annotation effort for 3D segmentation tasks. In the proposed approach, a fully annotated contour is only required for a single slice of the volume. In contrast, the rest of the slices with targets are annotated with relaxed Bboxes. This mixed-supervised method adopts fully supervised learning, relaxed Bbox prior, and contrastive learning during the training, which ensures the network exploits the discriminative information of the training volumes properly. The proposed method was evaluated on two public 3D medical imaging datasets (MRI prostate dataset and Vestibular Schwannoma [VS] dataset).

Results: The proposed method obtained a high segmentation Dice score of 85.3% on an MRI prostate dataset and 83.3% on a VS dataset with relaxed Bbox annotation, which are close to a fully supervised model. Moreover, with the same relaxed Bbox annotations, the proposed method outperforms the state-of-the-art methods. More importantly, the model performance is stable when the accuracy of Bbox annotation varies.

Conclusions: The presented study proposes a method based on a mixed-supervised learning method in 3D medical imaging. The benefit will be stable segmentation of the target in 3D images with low accurate annotation requirement, which leads to easier model training on large-scale datasets.

Keywords: 3D medical images; contrastive learning; mixed-supervised learning; relaxed bounding box.

MeSH terms

  • Humans
  • Image Processing, Computer-Assisted
  • Imaging, Three-Dimensional*
  • Male
  • Neuroma, Acoustic*
  • Pelvis
  • Prostate
  • Supervised Machine Learning