A whole-slide foundation model for digital pathology from real-world data

Hanwen Xu; Naoto Usuyama; Jaspreet Bagga; Sheng Zhang; Rajesh Rao; Tristan Naumann; Cliff Wong; Zelalem Gero; Javier González; Yu Gu; Yanbo Xu; Mu Wei; Wenhui Wang; Shuming Ma; Furu Wei; Jianwei Yang; Chunyuan Li; Jianfeng Gao; Jaylen Rosemon; Tucker Bower; Soohee Lee; Roshanthi Weerasinghe; Bill J Wright; Ari Robicsek; Brian Piening; Carlo Bifulco; Sheng Wang; Hoifung Poon

doi:10.1038/s41586-024-07441-w

A whole-slide foundation model for digital pathology from real-world data

Nature. 2024 Jun;630(8015):181-188. doi: 10.1038/s41586-024-07441-w. Epub 2024 May 22.

Authors

Hanwen Xu^#^{1

2}, Naoto Usuyama^#¹, Jaspreet Bagga¹, Sheng Zhang¹, Rajesh Rao¹, Tristan Naumann¹, Cliff Wong¹, Zelalem Gero¹, Javier González¹, Yu Gu¹, Yanbo Xu¹, Mu Wei¹, Wenhui Wang¹, Shuming Ma¹, Furu Wei¹, Jianwei Yang¹, Chunyuan Li¹, Jianfeng Gao¹, Jaylen Rosemon³, Tucker Bower³, Soohee Lee⁴, Roshanthi Weerasinghe⁴, Bill J Wright⁴, Ari Robicsek⁴, Brian Piening^{3

5}, Carlo Bifulco^{6

7}, Sheng Wang^{8

9}, Hoifung Poon¹⁰

Affiliations

¹ Microsoft Research, Redmond, WA, USA.
² Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA.
³ Providence Genomics, Portland, OR, USA.
⁴ Providence Research Network, Renton, WA, USA.
⁵ Earle A. Chiles Research Institute, Providence Cancer Institute, Portland, OR, USA.
⁶ Providence Genomics, Portland, OR, USA. carlo.bifulco@providence.org.
⁷ Earle A. Chiles Research Institute, Providence Cancer Institute, Portland, OR, USA. carlo.bifulco@providence.org.
⁸ Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA. swang@cs.washington.edu.
⁹ Department of Surgery, University of Washington, Seattle, WA, USA. swang@cs.washington.edu.
¹⁰ Microsoft Research, Redmond, WA, USA. hoifung@microsoft.com.

^# Contributed equally.

Abstract

Digital pathology poses unique computational challenges, as a standard gigapixel slide may comprise tens of thousands of image tiles^1-3. Prior models have often resorted to subsampling a small portion of tiles for each slide, thus missing the important slide-level context⁴. Here we present Prov-GigaPath, a whole-slide pathology foundation model pretrained on 1.3 billion 256 × 256 pathology image tiles in 171,189 whole slides from Providence, a large US health network comprising 28 cancer centres. The slides originated from more than 30,000 patients covering 31 major tissue types. To pretrain Prov-GigaPath, we propose GigaPath, a novel vision transformer architecture for pretraining gigapixel pathology slides. To scale GigaPath for slide-level learning with tens of thousands of image tiles, GigaPath adapts the newly developed LongNet⁵ method to digital pathology. To evaluate Prov-GigaPath, we construct a digital pathology benchmark comprising 9 cancer subtyping tasks and 17 pathomics tasks, using both Providence and TCGA data⁶. With large-scale pretraining and ultra-large-context modelling, Prov-GigaPath attains state-of-the-art performance on 25 out of 26 tasks, with significant improvement over the second-best method on 18 tasks. We further demonstrate the potential of Prov-GigaPath on vision-language pretraining for pathology^7,8 by incorporating the pathology reports. In sum, Prov-GigaPath is an open-weight foundation model that achieves state-of-the-art performance on various digital pathology tasks, demonstrating the importance of real-world data and whole-slide modelling.

MeSH terms

Benchmarking
Humans
Image Processing, Computer-Assisted
Neoplasms* / pathology
Pathology, Clinical