Clinical-Grade Detection of Microsatellite Instability in Colorectal Tumors by Deep Learning

Gastroenterology. 2020 Oct;159(4):1406-1416.e11. doi: 10.1053/j.gastro.2020.06.021. Epub 2020 Jun 17.


Background & aims: Microsatellite instability (MSI) and mismatch-repair deficiency (dMMR) in colorectal tumors are used to select treatment for patients. Deep learning can detect MSI and dMMR in tumor samples on routine histology slides faster and less expensively than molecular assays. However, clinical application of this technology requires high performance and multisite validation, which have not yet been performed.

Methods: We collected H&E-stained slides and findings from molecular analyses for MSI and dMMR from 8836 colorectal tumors (of all stages) included in the MSIDETECT consortium study, from Germany, the Netherlands, the United Kingdom, and the United States. Specimens with dMMR were identified by immunohistochemistry analyses of tissue microarrays for loss of MLH1, MSH2, MSH6, and/or PMS2. Specimens with MSI were identified by genetic analyses. We trained a deep-learning detector to identify samples with MSI from these slides; performance was assessed by cross-validation (N = 6406 specimens) and validated in an external cohort (n = 771 specimens). Prespecified endpoints were area under the receiver operating characteristic (AUROC) curve and area under the precision-recall curve (AUPRC).

Results: The deep-learning detector identified specimens with dMMR or MSI with a mean AUROC curve of 0.92 (lower bound, 0.91; upper bound, 0.93) and an AUPRC of 0.63 (range, 0.59-0.65), or 67% specificity and 95% sensitivity, in the cross-validation development cohort. In the validation cohort, the classifier identified samples with dMMR with an AUROC of 0.95 (range, 0.92-0.96) without image preprocessing and an AUROC of 0.96 (range, 0.93-0.98) after color normalization.

Conclusions: We developed a deep-learning system that detects colorectal cancer specimens with dMMR or MSI using H&E-stained slides; it detected tissues with dMMR with an AUROC of 0.96 in a large, international validation cohort. This system might be used for high-throughput, low-cost evaluation of colorectal tissue specimens.

Keywords: Lynch syndrome; biomarker; cancer immunotherapy; mutation.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Brain Neoplasms / diagnosis*
  • Brain Neoplasms / genetics
  • Brain Neoplasms / metabolism
  • Cohort Studies
  • Colorectal Neoplasms / diagnosis*
  • Colorectal Neoplasms / genetics
  • Colorectal Neoplasms / metabolism
  • DNA-Binding Proteins / metabolism
  • Deep Learning*
  • Female
  • Humans
  • Male
  • Microsatellite Instability*
  • Middle Aged
  • Mismatch Repair Endonuclease PMS2 / metabolism
  • MutL Protein Homolog 1 / metabolism
  • MutS Homolog 2 Protein / metabolism
  • Neoplastic Syndromes, Hereditary / diagnosis*
  • Neoplastic Syndromes, Hereditary / genetics
  • Neoplastic Syndromes, Hereditary / metabolism
  • Predictive Value of Tests
  • ROC Curve


  • DNA-Binding Proteins
  • G-T mismatch-binding protein
  • MLH1 protein, human
  • PMS2 protein, human
  • MSH2 protein, human
  • Mismatch Repair Endonuclease PMS2
  • MutL Protein Homolog 1
  • MutS Homolog 2 Protein

Supplementary concepts

  • Turcot syndrome