Comparative evaluation of ChatGPT-4o and DeepSeek-V3 in head and neck oncology

Acta Otolaryngol. 2025 Dec;145(12):1199-1207. doi: 10.1080/00016489.2025.2563035. Epub 2025 Oct 28.

Abstract

Background: Large language models (LLMs) are increasingly used in clinical decision-making and patient education, including in complex specialties such as head and neck cancer (HNC).

Objective: To evaluate the performance of ChatGPT-4o and DeepSeek-V3 in answering HNC-related clinical questions.

Methods: A set of 154 questions across six clinical categories was submitted twice to both models. Responses were independently graded by head and neck surgeons using a four-point accuracy scale. Accuracy, reproducibility, and inter-model agreement were assessed.

Results: ChatGPT-4o and DeepSeek-V3 provided ''comprehensive/correct'' answers in 92.2% and 89.6% of cases, respectively (p = .42). The accuracy ratings of both models' responses overlapped in 85.1% of cases; however, the statistical agreement between them remained low (Cohen's κ = 0.12; ICC = 0.21, p = .006). DeepSeek-V3 outperformed ChatGPT in Treatment category (96.3% vs. 81.5%, p = .08), while ChatGPT excelled in Recovery, Complications, and Follow-up (95.0% vs. 82.5%, p = .08); however, these differences did not reach statistical significance. Reproducibility was high for both models (ChatGPT-4o: 96.1%; DeepSeek-V3: 96.8%).

Conclusions: Both models demonstrated strong accuracy and consistency in HNC-related queries.

Significance: LLMs hold promise as reliable tools in clinical decision-making and patient education within HNCs when used with careful consideration of their inherent limitations.

Keywords: ChatGPT; DeepSeek; Head and neck cancer; artificial intelligence; large language models.

Publication types

  • Comparative Study

MeSH terms

  • Clinical Decision-Making* / methods
  • Female
  • Generative Artificial Intelligence
  • Head and Neck Neoplasms* / therapy
  • Humans
  • Male
  • Middle Aged
  • Reproducibility of Results
  • Surveys and Questionnaires