MedSapiens: Taking a pose to rethink medical imaging landmark detection

Med Image Anal. 2026 Mar 7:111:104015. doi: 10.1016/j.media.2026.104015. Online ahead of print.

Abstract

Accurate anatomical landmark detection is crucial for medical image analysis, yet progress in the medical domain is constrained by the scarcity of large, diverse datasets and by methods that rely heavily on domain-specific priors. Notably, human landmark detection models trained on large and diverse datasets offer spatial localization abilities that conceptually align with medical landmark detection. In this study, we investigate the adaptation of Sapiens, a human-centric foundation model designed for pose estimation, to medical imaging through a multi-dataset pretraining strategy, establishing new state-of-the-art performance across multiple benchmarks. Our proposed model, MedSapiens, demonstrates that human-centric foundation models, originally optimized for spatial pose localization, provide strong and transferable priors for anatomical landmark detection. We evaluate MedSapiens across six tasks spanning three imaging modalities. On the internal landmark detection benchmarks, MedSapiens achieves up to 5.26% improvement over generalist foundation models and up to 21.81% improvement over specialist methods. To assess cross-domain generalization, we further evaluate MedSapiens on two novel external downstream tasks: a dental CBCT landmark detection task and an echocardiography video measurement estimation task. MedSapiens achieves a 2.69% relative gain in success detection rate on the dental CBCT task and up to 43% reduction in measurement error compared with state-of-the-art methods. Code and model weights are available at https://github.com/xmed-lab/MedSapiens.

Keywords: Echocardiography measurement estimation,; Foundation models; Landmark detection; Medical imaging.