Purpose: Immunohistochemical (IHC) biomarkers such as estrogen receptor (ER), progesterone receptor (PR), HER2, and Ki-67 are essential for the classification and treatment of breast cancer. While radiomics-based models have demonstrated potential in non-invasive biomarker prediction, the utility of large language models (LLMs) for this task using only textual clinical data remains largely unexplored. This study aimed to evaluate the performance of ChatGPT-4o, a generative LLM, in predicting key IHC biomarkers based solely on structured radiological and pathological reports.
Methods: Fifty-five patients with breast cancer were retrospectively analyzed. For each patient, structured clinical, imaging, and pathology reports-excluding IHC data-were entered into ChatGPT-4o. The model was prompted to generate predictions for ER, PR, HER2, and Ki-67 expression. Predictions were repeated at two time points to assess reproducibility. Diagnostic performance was compared to pathology results using accuracy, sensitivity, specificity, and Cohen's kappa.
Results: The model achieved the highest accuracy for HER2 prediction (83.6%, κ = 0.51), followed by ER (81.8%, κ = 0.44) and PR (76.4%, κ = 0.39). For high Ki-67 expression, the sensitivity was 88.9% with moderate overall agreement (κ = 0.55). Inter-prediction agreement was substantial to almost perfect for all biomarkers (κ = 0.69-0.83).
Conclusion: ChatGPT-4o successfully predicted IHC biomarker status using only structured textual data. Its performance was comparable to radiomics models, offering a feasible and accessible AI tool to support early clinical decision-making, especially in resource-limited settings or before IHC results are available.
Keywords: Artificial intelligence; Biomarkers; Breast neoplasms; Immunohistochemistry; Large language models; Natural language processing.
© The Author(s), under exclusive licence to Springer Nature Switzerland AG 2025. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.