A Pipeline for Evaluation of Machine Learning/Artificial Intelligence Models to Quantify Programmed Death Ligand 1 Immunohistochemistry

Lab Invest. 2024 Apr 26;104(6):102070. doi: 10.1016/j.labinv.2024.102070. Online ahead of print.

Abstract

Immunohistochemistry (IHC) is used to guide treatment decisions in multiple cancer types. For treatment with checkpoint inhibitors, programmed death ligand 1 (PD-L1) IHC is used as a companion diagnostic. However, the scoring of PD-L1 is complicated by its expression in cancer and immune cells. Separation of cancer and noncancer regions is needed to calculate tumor proportion scores (TPS) of PD-L1, which is based on the percentage of PD-L1-positive cancer cells. Evaluation of PD-L1 expression requires highly experienced pathologists and is often challenging and time-consuming. Here, we used a multi-institutional cohort of 77 lung cancer cases stained centrally with the PD-L1 22C3 clone. We developed a 4-step pipeline for measuring TPS that includes the coregistration of hematoxylin and eosin, PD-L1, and negative control (NC) digital slides for exclusion of necrosis, segmentation of cancer regions, and quantification of PD-L1+ cells. As cancer segmentation is a challenging step for TPS generation, we trained DeepLab V3 in the Visiopharm software package to outline cancer regions in PD-L1 and NC images and evaluated the model performance by mean intersection over union (mIoU) against manual outlines. Only 14 cases were required to accomplish a mIoU of 0.82 for cancer segmentation in hematoxylin-stained NC cases. For PD-L1-stained slides, a model trained on PD-L1 tiles augmented by registered NC tiles achieved a mIoU of 0.79. In segmented cancer regions from whole slide images, the digital TPS achieved an accuracy of 75% against the manual TPS scores from the pathology report. Major reasons for algorithmic inaccuracies include the inclusion of immune cells in cancer outlines and poor nuclear segmentation of cancer cells. Our transparent and stepwise approach and performance metrics can be applied to any IHC assay to provide pathologists with important insights on when to apply and how to evaluate commercial automated IHC scoring systems.

Keywords: cancer segmentation; digital pathology; programmed death ligand 1; tumor proportion scores.