BEME systematic review: predictive values of measurements obtained in medical schools and future performance in medical practice

Med Teach. 2006 Mar;28(2):103-16. doi: 10.1080/01421590600622723.


Background: Effectiveness of medical education programs is most meaningfully measured as performance of its graduates.

Objectives: To assess the value of measurements obtained in medical schools in predicting future performance in medical practice.

Search strategy: The English literature from 1955 to 2004 was searched using MEDLINE, Embase, Cochrane's EPOC (Effective Practice and Organization of Care Group), Controlled Trial databases, ERIC, British Education Index, Psych Info, Timelit, Web of Science and hand searching of medical education journals.

Inclusion & exclusions: Selected studies included students assessed or followed up to internship, residency and/or practice after postgraduate training. Assessment systems and instruments studied (Predictors) were the National Board Medical Examinations (NBME) I and II, preclinical and clerkship grade-point average, Observed Standardized Clinical Examination scores and Undergraduate Dean's rankings and honors society. Outcome measures were residency supervisor ratings, NBME III, residency in-training examinations, American Specialty Board examination scores, and on-the-job practice performance.

Data extraction: Data were extracted by using a modification of the BEME data extraction form study objectives, design, sample variables, statistical analysis and results. All included studies are summarized in a tabular form. DATA ANALYSIS AND SYNTHESIS: Quantitative meta-analysis and qualitative approaches were used for data analysis and synthesis including the methodological quality of the studies included.

Results: Of 569 studies retrieved with our search strategy, 175 full text studies were reviewed. A total of 38 studies met our inclusion criteria and 19 had sufficient data to be included in a meta-analysis of correlation coefficients. The highest correlation between predictor and outcome was NBME Part II and NBME Part III, r = 0.72, 95% CI 0.30-0.49 and the lowest between NBME I and supervisor rating during residency, r = 0.22, 95% CI 0.13-0.30. The approach to studying the predictive value of assessment tools varied widely between studies and no consistent approach could be identified. Overall, undergraduate grades and rankings were moderately correlated with internship and residency performance. Performance on similar instruments was more closely correlated. Studies assessing practice performance beyond postgraduate training programs were few.

Conclusions: There is a need for a more consistent and systematic approach to studies of the effectiveness of undergraduate assessment systems and tools and their predictive value. Although existing tools do appear to have low to moderate correlation with postgraduate training performance, little is known about their relationship to longer-term practice patterns and outcomes.

Publication types

  • Review
  • Systematic Review

MeSH terms

  • Clinical Competence
  • Education, Medical, Graduate
  • Education, Medical, Undergraduate*
  • Educational Measurement / standards*
  • Humans
  • Professional Practice / standards*