Objective.Unsupervised deep learning has shown great promise in deformable image registration (DIR). These methods update model weights to optimize image similarity without requiring ground truth deformation vector fields (DVFs). However, they inherently face the ill-conditioning challenges due to structural ambiguities. This study aims to address these issues by integrating the implicit anatomical understanding of vision foundation models (FMs) into a multi-scale unsupervised framework for accurate and robust DIR.Approach.Our method takes moving and fixed images as inputs and leverages a pre-trained encoder from a vision FM to extract latent features. These features are merged with those extracted by convolutional adaptors to incorporate inductive bias. Correlation-aware multi-layer perceptrons decode the features into DVFs. A pyramid architecture is implemented to capture multi-range dependencies, further enhancing the DIR robustness and accuracy. We evaluated our method using a multi-modality, cross-institutional database consisting of 150 cardiac cine MR and 40 liver CT.Main results.Our model generates realistic and accurate DVFs. Moving images deformed by our method showed excellent similarity to fixed images, achieving a registration Dice score of 0.869 ± 0.093 for cardiac MRI and an average landmark error of 1.60 ± 1.44 mm for liver CT, substantially surpassing the state-of-the-art methods. Ablation studies further verified the effectiveness of integrating foundation features to improve DIR accuracy (p< 0.05).Significance.Our novel approach demonstrates significant advancements in DIR for multi-modality images with complex structures and low contrasts, making it a powerful tool for a wide range of applications in medical image analysis.
Keywords: cardiac MR registration; deep learning; deformable image registration; foundation model; liver CT registration; unsupervised learning.
© 2026 Institute of Physics and Engineering in Medicine. All rights, including for text and data mining, AI training, and similar technologies, are reserved.