Using Artificial Intelligence to Label Free-Text Operative and Ultrasound Reports for Grading Pediatric Appendicitis

J Pediatr Surg. 2024 May;59(5):783-790. doi: 10.1016/j.jpedsurg.2024.01.033. Epub 2024 Feb 2.

Abstract

Purpose: Data science approaches personalizing pediatric appendicitis management are hampered by small datasets and unstructured electronic medical records (EMR). Artificial intelligence (AI) chatbots based on large language models can structure free-text EMR data. We compare data extraction quality between ChatGPT-4 and human data collectors.

Methods: To train AI models to grade pediatric appendicitis preoperatively, several data collectors extracted detailed preoperative and operative data from 2100 children operated for acute appendicitis. Collectors were trained for the task based on satisfactory Kappa scores. ChatGPT-4 was prompted to structure free text from 103 random anonymized ultrasound and operative records in the dataset using the set variables and coding options, and to estimate appendicitis severity grade from the operative report. A pediatric surgeon then adjudicated all data, identifying errors in each method.

Results: Within the 44 ultrasound (42.7%) and 32 operative reports (31.1%) discordant in at least one field, 98% of the errors were found in the manual data extraction. The appendicitis grade was erroneously assigned manually in 29 patients (28.2%), and by ChatGPT-4 in 3 (2.9%). Across datasets, the use of the AI chatbot was able to avoid misclassification in 59.2% of the records including both reports and extracted data approximately 40 times faster.

Conclusion: AI chatbot significantly outperformed manual data extraction in accuracy for ultrasound and operative reports, and correctly assigned the appendicitis grade. While wider validation is required and data safety concerns must be addressed, these AI tools show significant promise in improving the accuracy and efficiency of research data collection.

Levels of evidence: Level III.

Keywords: Appendicitis grade; Artificial intelligence; Comparative study; Diagnosis; Pediatric appendicitis.

MeSH terms

  • Appendicitis* / diagnostic imaging
  • Appendicitis* / surgery
  • Artificial Intelligence
  • Child
  • Electronic Health Records
  • Humans
  • Surgeons*
  • Ultrasonography