The inter-rater reliability and internal consistency of a clinical evaluation exercise

J Gen Intern Med. 1992 Mar-Apr;7(2):174-9. doi: 10.1007/BF02598008.


Objective: To assess the internal consistency and inter-rater reliability of a clinical evaluation exercise (CEX) format that was designed to be easily utilized, but sufficiently detailed, to achieve uniform recording of the observed examination.

Design: A comparison of 128 CEXs conducted for 32 internal medicine interns by full-time faculty. This paper reports alpha coefficients as measures of internal consistency and several measures of inter-rater reliability.

Setting: A university internal medicine program. Observations were conducted at the end of the internship year.

Participants: Participants were 32 interns and observers were 12 full-time faculty in the department of medicine. The entire intern group was chosen in order to optimize the spectrum of abilities represented. Patients used for the study were recruited by the chief resident from the inpatient medical service based on their ability and willingness to participate.

Intervention: Each intern was observed twice and there were two examiners during each CEX. The examiners were given a standardized preparation and used a format developed over five years of previous pilot studies.

Measurements and main results: The format appeared to have excellent internal consistency; alpha coefficients ranged from 0.79 to 0.99. However, multiple methods of determining inter-rater reliability yielded similar results; intraclass correlations ranged from 0.23 to 0.50 and generalizability coefficients from a low of 0.00 for the overall rating of the CEX to a high of 0.61 for the physical examination section. Transforming scores to eliminate rater effects and dichotomizing results into pass-fail did not appear to enhance the reliability results.

Conclusions: Although the CEX is a valuable didactic tool, its psychometric properties preclude reliable assessment of clinical skills as a one-time observation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Clinical Competence*
  • Humans
  • Internship and Residency*
  • Medical History Taking
  • Observer Variation
  • Pennsylvania
  • Physical Examination
  • Reproducibility of Results