XRayWizard: Reconstructing 3-D lung surfaces from a single 2-D chest x-ray image via Vision Transformer

Med Phys. 2024 Apr;51(4):2806-2816. doi: 10.1002/mp.16781. Epub 2023 Oct 11.

Abstract

Background: Chest x-ray is widely utilized for the evaluation of pulmonary conditions due to its technical simplicity, cost-effectiveness, and portability. However, as a two-dimensional (2-D) imaging modality, chest x-ray images depict limited anatomical details and are challenging to interpret.

Purpose: To validate the feasibility of reconstructing three-dimensional (3-D) lungs from a single 2-D chest x-ray image via Vision Transformer (ViT).

Methods: We created a cohort of 2525 paired chest x-ray images (scout images) and computed tomography (CT) acquired on different subjects and we randomly partitioned them as follows: (1) 1800 - training set, (2) 200 - validation set, and (3) 525 - testing set. The 3-D lung volumes segmented from the chest CT scans were used as the ground truth for supervised learning. We developed a novel model termed XRayWizard that employed ViT blocks to encode the 2-D chest x-ray image. The aim is to capture global information and establish long-range relationships, thereby improving the performance of 3-D reconstruction. Additionally, a pooling layer at the end of each transformer block was introduced to extract feature information. To produce smoother and more realistic 3-D models, a set of patch discriminators was incorporated. We also devised a novel method to incorporate subject demographics as an auxiliary input to further improve the accuracy of 3-D lung reconstruction. Dice coefficient and mean volume error were used as performance metrics as the agreement between the computerized results and the ground truth.

Results: In the absence of subject demographics, the mean Dice coefficient for the generated 3-D lung volumes achieved a value of 0.738 ± 0.091. When subject demographics were included as an auxiliary input, the mean Dice coefficient significantly improved to 0.769 ± 0.089 (p < 0.001), and the volume prediction error was reduced from 23.5 ± 2.7%. to 15.7 ± 2.9%.

Conclusion: Our experiment demonstrated the feasibility of reconstructing 3-D lung volumes from 2-D chest x-ray images, and the inclusion of subject demographics as additional inputs can significantly improve the accuracy of 3-D lung volume reconstruction.

Keywords: 2‐D chest x‐ray; 3‐D reconstruction; deep learning; lung reconstruction; vision transformer.

MeSH terms

  • Humans
  • Image Processing, Computer-Assisted / methods
  • Lung* / diagnostic imaging
  • Thorax*
  • Tomography, X-Ray Computed / methods
  • X-Rays