Introduction: High-value care (HVC) suggests that good history taking and physical examination should lead to risk stratification that drives the use or withholding of diagnostic testing. This study describes the development of a series of virtual standardized patient (VSP) cases and provides preliminary evidence that supports their ability to provide experiential learning in HVC.
Methods: This pilot study used VSPs, or natural language processing-based patient avatars, within the USC Standard Patient platform. Faculty consensus was used to develop the cases, including the optimal diagnostic testing strategies, treatment options, and scored content areas. First-year resident physician learners experienced two 90-minute didactic sessions before completing the cases in a computer laboratory, using typed text to interview the avatar for history taking, then completing physical examination, differential diagnosis, diagnostic testing, and treatment modules for each case. Learners chose a primary and 2 alternative "possible" diagnoses from a list of 6 to 7 choices, diagnostic testing options from an extensive list, and treatments from a brief list ranging from 6 to 9 choices. For the history-taking module, both faculty and the platform scored the learners, and faculty assessed the appropriateness of avatar responses. Four randomly selected learner-avatar interview transcripts for each case were double rated by faculty for interrater reliability calculations. Intraclass correlations were calculated for interrater reliability, and Spearman ρ was used to determine the correlation between the platform and faculty ranking of learners' history-taking scores.
Results: Eight VSP cases were experienced by 14 learners. Investigators reviewed 112 transcripts (4646 learner query-avatar responses). Interrater reliability means were 0.87 for learner query scoring and 0.83 for avatar response. Mean learner success for history taking was scored by the faculty at 57% and by the platform at 51% (ρ correlation of learner rankings = 0.80, P = 0.02). The mean avatar appropriate response rate was 85.6% for all cases. Learners chose the correct diagnosis within their 3 choices 82% of the time, ordered a median (interquartile range) of 2 (2) unnecessary tests and completed 56% of optimal treatments.
Conclusions: Our avatar appropriate response rate was similar to past work using similar platforms. The simulations give detailed insights into the thoroughness of learner history taking and testing choices and with further refinement should support learning in HVC.