Introduction: Artificial intelligence (AI) tools such as ChatGPT are increasingly accessed by the public for health-related advice; however, their accuracy in acute burn management remains uncertain. We aimed to assess the domain-specific accuracy and quality of ChatGPT's burn guidance for public use, compared with clinicians, using British Burn Association (BBA) guidelines as the benchmark.
Methods: A single-blinded, comparative study was conducted using 20 burn scenarios encompassing varying severities, types, and patient groups. The responses from clinicians and ChatGPT were blindly and randomly evaluated by burns Consultants across five domains: first aid, dressing, pain relief, referral, and safety warnings. Correctness was scored as 1 (correct) or 0 (incorrect) according to the BBA recommendations. Overall response quality was assessed using a modified Global Quality Score (mGQS; 1-5), with scores ≥4 considered clinically acceptable. The McNemar's and Wilcoxon signed-rank tests compared domain accuracy and mean mGQS respectively.
Results: Clinicians demonstrated higher overall domain accuracy compared to ChatGPT (88% vs. 78%, p = 0.031). Performance was comparable in first aid (85%) and referral (100%), with ChatGPT showing marginally lower accuracy in dressing (85% vs. 90%) and safety (90% vs. 100%). Pain relief accuracy was notably lower for ChatGPT (30% vs. 65%, p = 0.023). Mean mGQS scores were higher for clinicians (4.33 ± 0.69 vs. 4.15 ± 0.63); however, this difference was not statistically significant.
Conclusion: ChatGPT provides generally safe and comprehensible initial burn guidance, comparable to clinicians in key domains. With further validation and inclusion of clear disclaimers, it can complement clinician-led care and serve as a valuable adjunct when timely professional input is unavailable.
Keywords: Artificial intelligence; Burn; ChatGPT; Decision; Safety.
Copyright © 2026 The Authors. Published by Elsevier Ltd.. All rights reserved.