Objective. Integrated positron emission tomography (PET)/computed tomography (CT) imaging plays a vital role in tumor diagnosis by offering both anatomical and functional information. However, the high cost, limited accessibility of PET imaging and concerns about cumulative radiation exposure in repeated scans may restrict its clinical use. This study aims to develop a cross-modal medical image synthesis method for generating PET images from CT scans, with a particular focus on accurately synthesizing lesion regions.Approach. We propose a two-stage generative adversarial network (GAN) termed multi-modal fusion pre-trained autoencoder (MMF-PAE-GAN) that integrates pre-GAN and post-GAN in terms of a PAE. The pre-GAN produces an initial pseudo PET image and provides the post-GAN with PET related multi-scale features. Unlike traditional sample adaptive encoder, the PAE enhances sample-specific representation by extracting multi-scale contextual features. To capture both lesion-related and non-lesion-related anatomical information, two CT scans processed under different window settings are fed into the post-GAN. Furthermore, a multi-modal weighted feature fusion module is introduced to dynamically highlight informative cross-modal features while suppress redundancies. A perceptual loss (PL), computed based on the PAE, is also used to impose constraints in feature-space and improve the fidelity lesion synthesis.Main results. On the AutoPET dataset, our method achieved a PSNR of 29.1781 dB, mean absolute error (MAE) of 0.0094, structural similarity index (SSIM) of 0.9217, normalized mean squared error (NMSE) of 0.3651 for pixel-level metrics, along with a sensitivity of 85.31%, specificity of 97.02% and accuracy of 95.97% for slice-level classification metrics. On the FAHSU dataset, these two metrics amount to a PSNR of 29.1506 dB, MAE of 0.0095, SSIM of 0.9193, NMSE of 0.3663, Sensitivity of 84.51%, Specificity of 96.82% and Accuracy of 95.71%.Significance. The proposed MMF-PAE-GAN can generate high-quality PET images directly from CT scans without the need for radioactive tracers, which potentially improves accessibility of functional imaging and reduces costs in clinical scenarios where PET acquisition is limited or repeated scans are not feasible.
Keywords: CT; PET; cross-modal medical image synthesis; generative adversarial network.
© 2025 Institute of Physics and Engineering in Medicine. All rights, including for text and data mining, AI training, and similar technologies, are reserved.