Performance of Google bard and ChatGPT in mass casualty incidents triage

Am J Emerg Med. 2024 Jan:75:72-78. doi: 10.1016/j.ajem.2023.10.034. Epub 2023 Oct 29.

Abstract

Aim: The objective of our research is to evaluate and compare the performance of ChatGPT, Google Bard, and medical students in performing START triage during mass casualty situations.

Method: We conducted a cross-sectional analysis to compare ChatGPT, Google Bard, and medical students in mass casualty incident (MCI) triage using the Simple Triage And Rapid Treatment (START) method. A validated questionnaire with 15 diverse MCI scenarios was used to assess triage accuracy and content analysis in four categories: "Walking wounded," "Respiration," "Perfusion," and "Mental Status." Statistical analysis compared the results.

Result: Google Bard demonstrated a notably higher accuracy of 60%, while ChatGPT achieved an accuracy of 26.67% (p = 0.002). Comparatively, medical students performed at an accuracy rate of 64.3% in a previous study. However, there was no significant difference observed between Google Bard and medical students (p = 0.211). Qualitative content analysis of 'walking-wounded', 'respiration', 'perfusion', and 'mental status' indicated that Google Bard outperformed ChatGPT.

Conclusion: Google Bard was found to be superior to ChatGPT in correctly performing mass casualty incident triage. Google Bard achieved an accuracy of 60%, while chatGPT only achieved an accuracy of 26.67%. This difference was statistically significant (p = 0.002).

Keywords: Artificial intelligence; Disaster medicine; Mass casualty incident; Triage.

MeSH terms

  • Computer Simulation
  • Cross-Sectional Studies
  • Humans
  • Mass Casualty Incidents*
  • Search Engine
  • Triage* / methods