Medical multimodal multitask foundation model for lung cancer screening

Nat Commun. 2025 Feb 11;16(1):1523. doi: 10.1038/s41467-025-56822-w.

Abstract

Lung cancer screening (LCS) reduces mortality and involves vast multimodal data such as text, tables, and images. Fully mining such big data requires multitasking; otherwise, occult but important features may be overlooked, adversely affecting clinical management and healthcare quality. Here we propose a medical multimodal-multitask foundation model (M3FM) for three-dimensional low-dose computed tomography (CT) LCS. After curating a multimodal multitask dataset of 49 clinical data types, 163,725 chest CT series, and 17 tasks involved in LCS, we develop a scalable multimodal question-answering model architecture for synergistic multimodal multitasking. M3FM consistently outperforms the state-of-the-art models, improving lung cancer risk and cardiovascular disease mortality risk prediction by up to 20% and 10% respectively. M3FM processes multiscale high-dimensional images, handles various combinations of multimodal data, identifies informative data elements, and adapts to out-of-distribution tasks with minimal data. In this work, we show that M3FM advances various LCS tasks through large-scale multimodal and multitask learning.

MeSH terms

  • Early Detection of Cancer* / methods
  • Humans
  • Lung Neoplasms* / diagnosis
  • Lung Neoplasms* / diagnostic imaging
  • Lung Neoplasms* / mortality
  • Tomography, X-Ray Computed / methods