Objective: To determine whether raters using the American Board of Internal Medicine (ABIM) Resident Evaluation Form can detect differences among residents in clinical competence.
Design: Cross-sectional study.
Setting: Inpatient general medicine service in a university-affiliated public hospital.
Participants: University-based internal medicine (UCIM) residents (ABIM certifying examination pass rate, 91%; mean score, 95th percentile), community hospital-based internal medicine (CHIM) residents (ABIM examination pass rate, 68%; mean score, 42nd percentile), and residents from three university-based non-internal medicine (UC non-IM) programs all assigned to the same inpatient general medicine service over a three-year period. Four hundred eighty-nine evaluations of 110 postgraduate-year-one residents were analyzed.
Measurements and main results: Mean ratings for the UCIM residents were significantly higher than those for the CHIM or UC non-IM residents (analysis of variance [ANOVA], p < 0.05). Variance was smallest for the UCIM residents (F test, p < 0.01), and only the UCIM residents' mean scores were in the "superior" range (7-9) in all evaluated categories. The mean ratings for the CHIM residents while at the university-affiliated hospital were not significantly different from the ratings of the same residents at their home hospital. The ratings for the CHIM residents at either site were significantly lower than those for the UCIM residents in all categories (ANOVA, p < 0.05). Factor analysis revealed a single factor accounting for 76% of the variance among the ratings with all dimensions loading high on that factor (0.75-0.95), providing evidence for a "halo" effect. Mean interrater agreement over all variables was 0.87, indicating good consistency among raters.
Conclusions: Ratings on the ABIM Resident Evaluation Form detect global differences among residents in clinical competence in the expected direction based on type of training program and performance on the ABIM certification examination, but fail to differentiate among the nine evaluated dimensions of clinical care. This rating method may be valid for assessing overall clinical performance, but is less useful for providing feedback in specific areas to individual residents.