Text-based prediction of ımmunohistochemical biomarkers in breast cancer using a generative large language model: a retrospective study

Health Inf Sci Syst. 2025 Nov 21;14(1):3. doi: 10.1007/s13755-025-00398-8. eCollection 2026 Dec.

Abstract

Purpose: Immunohistochemical (IHC) biomarkers such as estrogen receptor (ER), progesterone receptor (PR), HER2, and Ki-67 are essential for the classification and treatment of breast cancer. While radiomics-based models have demonstrated potential in non-invasive biomarker prediction, the utility of large language models (LLMs) for this task using only textual clinical data remains largely unexplored. This study aimed to evaluate the performance of ChatGPT-4o, a generative LLM, in predicting key IHC biomarkers based solely on structured radiological and pathological reports.

Methods: Fifty-five patients with breast cancer were retrospectively analyzed. For each patient, structured clinical, imaging, and pathology reports-excluding IHC data-were entered into ChatGPT-4o. The model was prompted to generate predictions for ER, PR, HER2, and Ki-67 expression. Predictions were repeated at two time points to assess reproducibility. Diagnostic performance was compared to pathology results using accuracy, sensitivity, specificity, and Cohen's kappa.

Results: The model achieved the highest accuracy for HER2 prediction (83.6%, κ = 0.51), followed by ER (81.8%, κ = 0.44) and PR (76.4%, κ = 0.39). For high Ki-67 expression, the sensitivity was 88.9% with moderate overall agreement (κ = 0.55). Inter-prediction agreement was substantial to almost perfect for all biomarkers (κ = 0.69-0.83).

Conclusion: ChatGPT-4o successfully predicted IHC biomarker status using only structured textual data. Its performance was comparable to radiomics models, offering a feasible and accessible AI tool to support early clinical decision-making, especially in resource-limited settings or before IHC results are available.

Keywords: Artificial intelligence; Biomarkers; Breast neoplasms; Immunohistochemistry; Large language models; Natural language processing.