Foundation models are a recently described class of machine learning algorithms that use large amounts of data and training techniques that do not require content expert data labeling. They are trained to gain a representation of what patterns exist in the provided pretraining data, and provide that representation as an output. In the context of pathology, they are useful as a "first step" in machine learning projects where they provide an internalized representation of pathology. This internal representation can be used for downstream tasks like tumor classification or biomarker status prediction. Because of the advantages these models provide, including the ability to learn from large datasets and avoiding time-consuming labeling by pathologists, these models will likely increase in prevalence in pathology machine learning research and potentially clinical application. This perspective provides a non-technical overview of transformer models in order to assist practicing pathologists with understanding how these models work, which pathology foundation models have been released, and how they are being used in the research setting-while also using head-and-neck-specific data in a publicly available foundation model to illustrate these points.
Keywords: Computer vision; Foundation model; Large language model; Transformer; Vision transformer.
© 2026. The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.