Purpose: To appraise the reported validity and reliability of evaluation methods used in high-quality trials of continuing medical education (CME).
Method: The authors conducted a systematic review (1981 to February 2006) by hand-searching key journals and searching electronic databases. Eligible articles studied CME effectiveness using randomized controlled trials or historic/concurrent comparison designs, were conducted in the United States or Canada, were written in English, and involved at least 15 physicians. Sequential double review was conducted for data abstraction, using a traditional approach to validity and reliability.
Results: Of 136 eligible articles, 47 (34.6%) reported the validity or reliability of at least one evaluation method, for a total of 62 methods; 31 methods were drawn from previous sources. The most common targeted outcome was practice behavior (21 methods). Validity was reported for 31 evaluation methods, including content (16), concurrent criterion (8), predictive criterion (1), and construct (5) validity. Reliability was reported for 44 evaluation methods, including internal consistency (20), interrater (16), intrarater (2), equivalence (4), and test-retest (5) reliability. When reported, statistical tests yielded modest evidence of validity and reliability. Translated to the contemporary classification approach, our data indicate that reporting about internal structure validity exceeded reporting about other categories of validity evidence.
Conclusions: The evidence for CME effectiveness is limited by weaknesses in the reported validity and reliability of evaluation methods. Educators should devote more attention to the development and reporting of high-quality CME evaluation methods and to emerging guidelines for establishing the validity of CME evaluation methods.