Multimodal image translation via deep learning inference model trained in video domain

Jiawei Fan; Zhiqiang Liu; Dong Yang; Jian Qiao; Jun Zhao; Jiazhou Wang; Weigang Hu

doi:10.1186/s12880-022-00854-x

Multimodal image translation via deep learning inference model trained in video domain

BMC Med Imaging. 2022 Jul 14;22(1):124. doi: 10.1186/s12880-022-00854-x.

Authors

Jiawei Fan^#^{1

2

3}, Zhiqiang Liu^#⁴, Dong Yang^{1

2

3}, Jian Qiao^{1

2

3}, Jun Zhao^{1

2

3}, Jiazhou Wang^{1

2

3}, Weigang Hu^{5

6

7}

Affiliations

¹ Department of Radiation Oncology, Fudan University Shanghai Cancer Center, Shanghai, 200032, People's Republic of China.
² Department of Oncology, Shanghai Medical College Fudan University, Shanghai, 200032, People's Republic of China.
³ Shanghai Key Laboratory of Radiation Oncology, Shanghai, 200032, People's Republic of China.
⁴ National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.
⁵ Department of Radiation Oncology, Fudan University Shanghai Cancer Center, Shanghai, 200032, People's Republic of China. jackhuwg@hotmail.com.
⁶ Department of Oncology, Shanghai Medical College Fudan University, Shanghai, 200032, People's Republic of China. jackhuwg@hotmail.com.
⁷ Shanghai Key Laboratory of Radiation Oncology, Shanghai, 200032, People's Republic of China. jackhuwg@hotmail.com.

^# Contributed equally.

Abstract

Background: Current medical image translation is implemented in the image domain. Considering the medical image acquisition is essentially a temporally continuous process, we attempt to develop a novel image translation framework via deep learning trained in video domain for generating synthesized computed tomography (CT) images from cone-beam computed tomography (CBCT) images.

Methods: For a proof-of-concept demonstration, CBCT and CT images from 100 patients were collected to demonstrate the feasibility and reliability of the proposed framework. The CBCT and CT images were further registered as paired samples and used as the input data for the supervised model training. A vid2vid framework based on the conditional GAN network, with carefully-designed generators, discriminators and a new spatio-temporal learning objective, was applied to realize the CBCT-CT image translation in the video domain. Four evaluation metrics, including mean absolute error (MAE), peak signal-to-noise ratio (PSNR), normalized cross-correlation (NCC), and structural similarity (SSIM), were calculated on all the real and synthetic CT images from 10 new testing patients to illustrate the model performance.

Results: The average values for four evaluation metrics, including MAE, PSNR, NCC, and SSIM, are 23.27 ± 5.53, 32.67 ± 1.98, 0.99 ± 0.0059, and 0.97 ± 0.028, respectively. Most of the pixel-wise hounsfield units value differences between real and synthetic CT images are within 50. The synthetic CT images have great agreement with the real CT images and the image quality is improved with lower noise and artifacts compared with CBCT images.

Conclusions: We developed a deep-learning-based approach to perform the medical image translation problem in the video domain. Although the feasibility and reliability of the proposed framework were demonstrated by CBCT-CT image translation, it can be easily extended to other types of medical images. The current results illustrate that it is a very promising method that may pave a new path for medical image translation research.

Keywords: Deep learning; GAN; Medical image translation; Video domain.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Cone-Beam Computed Tomography / methods
Deep Learning*
Humans
Image Processing, Computer-Assisted / methods
Reproducibility of Results
Signal-To-Noise Ratio