Background: Triage is an essential process in every emergency department to prioritize patients' presentation based on the level of urgency. Effective and accurate triage is associated with better patient outcomes and resource allocation. Recently, artificial intelligence (AI) systems, including AI chatbots, have shown potential in automating complex clinical tasks such as triaging.
Objectives: This study aims to assess the accuracy of an AI Chatbot (ChatGPT-4o) triaging emergency cases and compare its performances with expert emergency physicians.
Methods: This is a cross-sectional observational study in which 60 emergency case scenarios were analyzed. First, they were triaged by four expert emergency physicians with more than 10 years of experience; cases with three out of four physician agreements were included. Second, these cases were triaged by ChatGPT-4o and compared with those of expert clinicians. Cohen's kappa was used to measure the level.
Results: Clinicians' consensus resulted in the inclusion of 46 cases. The overall kappa value for the ChatGPT-4o was 0.695 (95% confidence interval 0.53036-0.85964), reflecting moderate to substantial agreement with expert clinicians. ChatGPT-4o sensitivity and specificity at Triage Level 1 were 100% and 97.67%, respectively, and sensitivity of 100% and specificity of 93.02% was observed at Triage Level 5, whereas it was found to be least sensitive (50%) at Triage Level 4.
Conclusion: ChatGPT-4o triaging showed strong agreement with expert physicians and high sensitivity for critical patients. Although not yet ready to replace clinical professionals, these AI tools could serve as effective decision-support resources, allowing health care teams to concentrate on the most urgent cases.
Keywords: AI; Chatbot; emergency department; machine learning; triage.
Copyright © 2025 Elsevier Inc. All rights reserved.