Evaluating base and retrieval augmented LLMs with document or online support for evidence based neurology

NPJ Digit Med. 2025 Mar 4;8(1):137. doi: 10.1038/s41746-025-01536-y.

Abstract

Effectively managing evidence-based information is increasingly challenging. This study tested large language models (LLMs), including document- and online-enabled retrieval-augmented generation (RAG) systems, using 13 recent neurology guidelines across 130 questions. Results showed substantial variability. RAG improved accuracy compared to base models but still produced potentially harmful answers. RAG-based systems performed worse on case-based than knowledge-based questions. Further refinement and improved regulation is needed for safe clinical integration of RAG-enhanced LLMs.