Performance of Three Conversational Artificial Intelligence Agents in Defining End-of-Life Care Terms

J Palliat Med. 2025 Aug;28(8):1102-1107. doi: 10.1089/jpm.2024.0526. Epub 2025 Mar 26.

Abstract

Background: Conversational artificial intelligence agents, or chatbots, are a transformational technology understudied in end-of-life care. Methods: OpenAI's ChatGPT, Google's Bard, and Microsoft's Bing were asked to define "terminally ill," "end of life," "transitions of care," "actively dying," and provide three references. Outputs were scored by six physicians on a scale of 0-10 for accuracy, comprehensiveness, and credibility. Flesch-Kincaid Grade Level and Flesch Reading Ease (FRE) were used to calculate readability. Results: Mean (standard deviation) scores for accuracy were 9 (1.9) for ChatGPT, 7.5 (2.4) for Bard, and 8.3 (2.4) for Bing. Comprehensiveness scores averaged 8.5 (1.7) for ChatGPT, 7.3 (2.1) for Bard, and 6.5 (2.3) for Bing. Credibility was low with a mean score of 3 (1.8). The mean FRE score was 41.7, and the mean grade level was 14.1, indicating low readability. Conclusion: Chatbot outputs had important deficiencies that necessitated clinician oversight to prevent misinformation.

Keywords: conversational artificial intelligence; end-of-life care; large language models; palliative care.

MeSH terms

  • Artificial Intelligence*
  • Communication*
  • Comprehension
  • Humans
  • Terminal Care*
  • Terminology as Topic*