Multiview Self-Supervised Segmentation for OARs Delineation in Radiotherapy

Cong Liu; Xiaofei Zhang; Wen Si; Xinye Ni

doi:10.1155/2021/8894222

Multiview Self-Supervised Segmentation for OARs Delineation in Radiotherapy

Evid Based Complement Alternat Med. 2021 Mar 5:2021:8894222. doi: 10.1155/2021/8894222. eCollection 2021.

Authors

Cong Liu^{1

2

3}, Xiaofei Zhang⁴, Wen Si^{1

5}, Xinye Ni^{2

3}

Affiliations

¹ Faculty of Business Information, Shanghai Business School, Shanghai 200235, China.
² The Affiliated Changzhou No. 2 People's Hospital of Nanjing Medical University, Changzhou 213003, China.
³ Center of Medical Physics, Nanjing Medical University, Changzhou 213003, China.
⁴ Department of Comprehensive Treatment, Qingzhou Hospital of Traditional Chinese Medicine, Weifang 262500, China.
⁵ Huashan Hospital, Fudan University, Shanghai 200031, China.

Abstract

Radiotherapy has become a common treatment option for head and neck (H&N) cancer, and organs at risk (OARs) need to be delineated to implement a high conformal dose distribution. Manual drawing of OARs is time consuming and inaccurate, so automatic drawing based on deep learning models has been proposed to accurately delineate the OARs. However, state-of-the-art performance usually requires a decent amount of delineation, but collecting pixel-level manual delineations is labor intensive and may not be necessary for representation learning. Encouraged by the recent progress in self-supervised learning, this study proposes and evaluates a novel multiview contrastive representation learning to boost the models from unlabelled data. The proposed learning architecture leverages three views of CTs (coronal, sagittal, and transverse plane) to collect positive and negative training samples. Specifically, a CT in 3D is first projected into three 2D views (coronal, sagittal, and transverse planes), then a convolutional neural network takes 3 views as inputs and outputs three individual representations in latent space, and finally, a contrastive loss is used to pull representation of different views of the same image closer ("positive pairs") and push representations of views from different images ("negative pairs") apart. To evaluate performance, we collected 220 CT images in H&N cancer patients. The experiment demonstrates that our method significantly improves quantitative performance over the state-of-the-art (from 83% to 86% in absolute Dice scores). Thus, our method provides a powerful and principled means to deal with the label-scarce problem.