An ophthalmic video foundation model for surgical recognition and navigation with wet-lab porcine eye validation

Nat Biomed Eng. 2026 Mar 3. doi: 10.1038/s41551-026-01622-w. Online ahead of print.

Abstract

Foundation models in artificial intelligence are revolutionizing healthcare by utilizing large-scale unlabelled data for pretraining. However, their intraoperative applications remain underexplored owing to limited surgical data and the challenges of real-time deployment. Here we show the development of the ophthalmic video foundation model (OVFM), designed for microscopic ophthalmic surgical recognition and navigation. Leveraging a self-supervised video transformer structure and trained on an ophthalmic video dataset comprising 1.1 million clips across 144 surgical types, OVFM learns the spatiotemporal motion features of ophthalmic procedures. We demonstrate OVFM's superior performance across seven downstream tasks. To enable real-time use, we applied knowledge distillation, reducing the model's size while retaining its accuracy, which allows for deployment on surgical microscope units. In cataract surgeries performed by ten surgeons on wet-lab porcine eyes, the OVFM-powered system enhanced surgical performance and reduced skill gaps, demonstrating notable potential for real-time, intraoperative applications across various surgical fields.