Background: Developing diagnostic reasoning in nephrology is particularly challenging due to its pathophysiological complexity and reliance on abstract clinical data. Objective Structured Clinical Examinations (OSCEs) are pivotal for nephrology training but remain resource-intensive and difficult to scale. Generative artificial intelligence (AI) offers a promising alternative, yet its capacity to emulate nephrology-specific OSCEs has not been formally assessed.
Methods: We developed ECOSBot, a web-based tool powered by GPT-4o, to simulate both standardized patients and examiners for nephrology-focused OSCEs. In this multicenter prospective study, undergraduate medical students from five French medical schools interacted with ECOSBot across four clinical stations. All interactions were double-rated by nephrology faculty members to establish a gold standard. ECOSBot's performance was evaluated against this standard using four criteria (script coverage, authenticity, correctness and relevance) for patient simulation, and via checklists and competency-based ratings for examiner scoring. Usability was assessed using the Chatbot Usability Questionnaire (CUQ), adapted to include six items on feedback quality.
Results: Ninety-one students generated 2939 prompts across 184 OSCE sessions. ECOSBot demonstrated high fidelity in patient simulation: authenticity 98.6% [95% confidence interval (CI) 98.2-99.0], correctness 98.3% (95% CI 97.9-98.7) and relevance 99.2% (95% CI 98.9-99.5), including during exchanges not explicitly covered by the pre-specified scenario. As an examiner, ECOSBot showed strong agreement with human raters on global scores [intraclass correlation coefficient (ICC) = 0.94, 95% CI 0.91-0.96], consistent across case formats, training levels and institutions. However, scoring of attitude and communication skills was less reliable (ICC = 0.44, 95% CI 0.28-0.58). Median CUQ score was 69.7/100, with 91.7% of students finding the tool highly useful for OSCE preparation in nephrology.
Conclusions: ECOSBot reliably simulated both roles in nephrology OSCEs with high fidelity and strong alignment with expert rating. While challenges remain for subjective skill assessment, this tool offers a scalable and autonomous solution to enhance nephrology education.
Keywords: LLM-powered simulation; generative AI; medical education technology; nephrology; virtual standardized patient.
© The Author(s) 2025. Published by Oxford University Press on behalf of the ERA.