AI-based pathology predicts origins for cancers of unknown primary

Nature. 2021 Jun;594(7861):106-110. doi: 10.1038/s41586-021-03512-4. Epub 2021 May 5.


Cancer of unknown primary (CUP) origin is an enigmatic group of diagnoses in which the primary anatomical site of tumour origin cannot be determined1,2. This poses a considerable challenge, as modern therapeutics are predominantly specific to the primary tumour3. Recent research has focused on using genomics and transcriptomics to identify the origin of a tumour4-9. However, genomic testing is not always performed and lacks clinical penetration in low-resource settings. Here, to overcome these challenges, we present a deep-learning-based algorithm-Tumour Origin Assessment via Deep Learning (TOAD)-that can provide a differential diagnosis for the origin of the primary tumour using routinely acquired histology slides. We used whole-slide images of tumours with known primary origins to train a model that simultaneously identifies the tumour as primary or metastatic and predicts its site of origin. On our held-out test set of tumours with known primary origins, the model achieved a top-1 accuracy of 0.83 and a top-3 accuracy of 0.96, whereas on our external test set it achieved top-1 and top-3 accuracies of 0.80 and 0.93, respectively. We further curated a dataset of 317 cases of CUP for which a differential diagnosis was assigned. Our model predictions resulted in concordance for 61% of cases and a top-3 agreement of 82%. TOAD can be used as an assistive tool to assign a differential diagnosis to complicated cases of metastatic tumours and CUPs and could be used in conjunction with or in lieu of ancillary tests and extensive diagnostic work-ups to reduce the occurrence of CUP.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Artificial Intelligence*
  • Cohort Studies
  • Computer Simulation* / standards
  • Female
  • Humans
  • Male
  • Neoplasm Metastasis / pathology
  • Neoplasms, Unknown Primary / diagnosis
  • Neoplasms, Unknown Primary / pathology*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Workflow