Undesired variance due to examiner stringency/leniency effect in communication skill scores assessed in OSCEs

Adv Health Sci Educ Theory Pract. 2008 Dec;13(5):617-32. doi: 10.1007/s10459-007-9068-0. Epub 2007 Jul 3.


Physician-patient communication is a clinical skill that can be learned and has a positive impact on patient satisfaction and health outcomes. A concerted effort at all medical schools is now directed at teaching and evaluating this core skill. Student communication skills are often assessed by an Objective Structure Clinical Examination (OSCE). However, it is unknown what sources of error variance are introduced into examinee communication scores by various OSCE components. This study primarily examined the effect different examiners had on the evaluation of students' communication skills assessed at the end of a family medicine clerkship rotation. The communication performance of clinical clerks from Classes 2005 and 2006 were assessed using six OSCE stations. Performance was rated at each station using the 28-item Calgary-Cambridge guide. Item Response Theory analysis using a Multifaceted Rasch model was used to partition the various sources of error variance and generate a "true" communication score where the effects of examiner, case, and items are removed. Variance and reliability of scores were as follows: communication scores (.20 and .87), examiner stringency/leniency (.86 and .91), case (.03 and .96), and item (.86 and .99), respectively. All facet scores were reliable (.87-.99). Examiner variance (.86) was more than four times the examinee variance (.20). About 11% of the clerks' outcome status shifted using "true" rather than observed/raw scores. There was large variability in examinee scores due to variation in examiner stringency/leniency behaviors that may impact pass-fail decisions. Exploring the benefits of examiner training and employing "true" scores generated using Item Response Theory analyses prior to making pass/fail decisions are recommended.

MeSH terms

  • Clinical Clerkship
  • Clinical Competence
  • Communication*
  • Educational Measurement / methods*
  • Educational Measurement / standards
  • Faculty, Medical
  • Family Practice / education
  • Female
  • Humans
  • Judgment
  • Logistic Models
  • Male
  • Observer Variation
  • Reproducibility of Results
  • Self Concept