Background: Histological grade is a well-known prognostic factor that is routinely assessed in breast tumours. However, manual assessment of Nottingham Histological Grade (NHG) has high inter-assessor and inter-laboratory variability, causing uncertainty in grade assignments. To address this challenge, we developed and validated a three-level NHG-like deep learning-based histological grade model (predGrade). The primary performance evaluation focuses on prognostic performance.
Methods: This observational study is based on two patient cohorts (SöS-BC-4, N = 2421 (training and internal test); SCAN-B-Lund, N = 1262 (test)) that include routine histological whole-slide images (WSIs) together with patient outcomes. A deep convolutional neural network (CNN) model with an attention mechanism was optimised for the classification of the three-level histological grading (NHG) from haematoxylin and eosin-stained WSIs. The prognostic performance was evaluated by time-to-event analysis of recurrence-free survival and compared to clinical NHG grade assignments in the internal test set as well as in the fully independent external test cohort.
Results: We observed effect sizes (hazard ratio) for grade 3 versus 1, for the conventional NHG method (HR = 2.60 (1.18-5.70 95%CI, p-value = 0.017)) and the deep learning model (HR = 2.27, 95%CI 1.07-4.82, p-value = 0.033) on the internal test set after adjusting for established clinicopathological risk factors. In the external test set, the unadjusted HR for clinical NHG 2 versus 1 was estimated to be 2.59 (p-value = 0.004) and clinical NHG 3 versus 1 was estimated to be 3.58 (p-value < 0.001). For predGrade, the unadjusted HR for predGrade 2 versus 1 HR = 2.52 (p-value = 0.030), and 4.07 (p-value = 0.001) for preGrade 3 versus 1 was observed in the independent external test set. In multivariable analysis, HR estimates for neither clinical NHG nor predGrade were found to be significant (p-value > 0.05). We tested for differences in HR estimates between NHG and predGrade in the independent test set and found no significant difference between the two classification models (p-value > 0.05), confirming similar prognostic performance between conventional NHG and predGrade.
Conclusion: Routine histopathology assessment of NHG has a high degree of inter-assessor variability, motivating the development of model-based decision support to improve reproducibility in histological grading. We found that the proposed model (predGrade) provides a similar prognostic performance as clinical NHG. The results indicate that deep CNN-based models can be applied for breast cancer histological grading.
Keywords: Breast cancer; Clinical decision support; Deep learning; Image analysis; Pathology.
© 2024. The Author(s).