UA-VLFM: An Uncertainty-aware Vision-Language Foundation Model for Auxiliary Diagnosis of Vitreoretinal Iymphoma

IEEE J Biomed Health Inform. 2025 Sep 19:PP. doi: 10.1109/JBHI.2025.3611985. Online ahead of print.

Abstract

Vitreoretinal lymphoma (VRL) is a rare malignant ocular tumor, and its early diagnosis is crucial for patient prognosis. However, due to its insidious and diverse clinical manifestations, it is often misdiagnosed as other ophthalmic diseases, leading to blindness or even fatal outcomes. In this study, an uncertainty-aware visionlanguage foundational model (UA-VLFM) based on contrastive learning and uncertainty estimation is developed to achieve automatic classification of VRL and other 5 retinal diseases. First, we integrate MAE-based pretraining knowledge on large-scale optical coherence tomography (OCT) images and efficient Low-rank adaption (LoRA) optimization strategy to enhance the representation ability and optimization efficiency of the model. Moreover, an uncertainty-aware contrastive learning method based on Dirichlet distribution within the contrastive vision-language pretraining framework is proposed to further align vision and language feature in the high-dimensional embedding space and obtain prediction results with corresponding uncertainty scores, thereby enhancing the reliability of VRL diagnosis. In the test dataset with 5,563 OCT images, UA-VLFM achieves a higher average F1 score of 0.9684 than other state-of-the-art algorithms (0.8186-0.9427) and improves to 0.9839 with the threshold strategy. Notably, the proposed UA-VLFM achieves an F1 score of 0.9217 and 0.9544 before and after thresholding on VRL, the most challenging category, significantly outperforming other methods (0.5089-0.9366 and 0.6639-0.9133). Our UA-VLFM provides a trustworthy method for aiding in the diagnosis of VRL on retinal OCT images. The code has been released on Github: https://github.com/wang-wen-wen/UA-VLFM.