Background: Given that there are variations in clinicians' reasoning, methods to elaborate scoring checklists for standardised patient-based assessment need to be valid. The use of data elicited by experts solving problems independently has been advocated as a method of setting performance standards.
Aims: To determine the degree of concurrence and common characteristics among items independently elicited by doctors during patient encounters and to assess the number of experts needed to derive reliable performance standards.
Methods: Six experienced internists worked-up the same 7 chief complaints with standardised patients (SPs). A stimulated recall of the recorded encounter was then performed. The degree of concurrence of the collected history and physical examination information and the generated diagnostic hypotheses was computed. Reliability was derived from generalisability analyses.
Results: By case, experts elicited a mean of 114 information items (SD = 15) and generated 30 diagnostic hypotheses (SD = 6). A high concurrence (80-100%) was observed for a mean of 22 information items (20%; SD = 6) and 7 diagnostic hypotheses (24%; SD = 2). More than a third of the 153 highly concurrent information items were clarification questions. At least 3 doctors were needed to obtain a reliability of 0.80 or higher when deriving the scoring checklists.
Conclusion: The limited concurrency in data elicited by clinicians during a patient encounter supports the use of high-fidelity methods to develop performance checklists used in SP-based assessment. It also suggests that relying only on information collected to assess clinical competence may not be sufficient. Additional criteria, such as structure and style of work-up, should be further explored.