Purpose: Nonlinear multimodal image registration, for example, the fusion of computed tomography (CT) and magnetic resonance imaging (MRI), fundamentally depends on a definition of image similarity. Previous methods that derived modality-invariant representations focused on either global statistical grayscale relations or local structural similarity, both of which are prone to local optima. In contrast to most learning-based methods that rely on strong supervision of aligned multimodal image pairs, we aim to overcome this limitation for further practical use cases.
Methods: We propose a new concept that exploits anatomical shape information and requires only segmentation labels for both modalities individually. First, a shape-constrained encoder-decoder segmentation network without skip connections is jointly trained on labeled CT and MRI inputs. Second, an iterative energy-based minimization scheme is introduced that relies on the capability of the network to generate intermediate nonlinear shape representations. This further eases the multimodal alignment in the case of large deformations.
Results: Our novel approach robustly and accurately aligns 3D scans from the multimodal whole-heart segmentation dataset, outperforming classical unsupervised frameworks. Since both parts of our method rely on (stochastic) gradient optimization, it can be easily integrated in deep learning frameworks and executed on GPUs.
Conclusions: We present an integrated approach for weakly supervised multimodal image registration. Achieving promising results due to the exploration of intermediate shape features as registration guidance encourages further research in this direction.
Keywords: Encoder–decoder network; Guided image registration; Multimodal fusion; Nonlinear shape interpolation.