A systematic review of the reliability of objective structured clinical examination scores

Med Educ. 2011 Dec;45(12):1181-9. doi: 10.1111/j.1365-2923.2011.04075.x. Epub 2011 Oct 11.


Context: The objective structured clinical examination (OSCE) is comprised of a series of simulations used to assess the skill of medical practitioners in the diagnosis and treatment of patients. It is often used in high-stakes examinations and therefore it is important to assess its reliability and validity.

Methods: The published literature was searched (PsycINFO, PubMed) for OSCE reliability estimates (coefficient alpha and generalisability coefficients) computed either across stations or across items within stations. Coders independently recorded information about each study. A meta-analysis of the available literature was computed and sources of systematic variance in estimates were examined.

Results: A total of 188 alpha values from 39 studies were coded. The overall (summary) alpha across stations was 0.66 (95% confidence interval [CI] 0.62-0.70); the overall alpha within stations across items was 0.78 (95% CI 0.73-0.82). Better than average reliability was associated with a greater number of stations and a higher number of examiners per station. Interpersonal skills were evaluated less reliably across stations and more reliably within stations compared with clinical skills.

Conclusions: Overall scores on the OSCE are often not very reliable. It is more difficult to reliably assess communication skills than clinical skills when considering both as general traits that should apply across situations. It is generally helpful to use two examiners and large numbers of stations, but some OSCEs appear more reliable than others for reasons that are not yet fully understood.

Publication types

  • Meta-Analysis
  • Review
  • Systematic Review

MeSH terms

  • Clinical Competence / standards*
  • Education, Medical
  • Education, Medical, Undergraduate / standards
  • Educational Measurement / methods*
  • Educational Measurement / standards*
  • Humans
  • Medical History Taking / standards
  • Reproducibility of Results