Objective: To examine the reliability of scores obtained from a proposed critical appraisal tool (CAT).
Study design and setting: Based on a random sample of 24 health-related research papers, the scores from the proposed CAT were examined using intraclass correlation coefficients (ICCs), generalizability theory, and participants' feedback.
Results: The ICC for all research papers was 0.83 (consistency) and 0.74 (absolute agreement) for four participants. For individual research designs, the highest ICC (consistency) was for qualitative research (0.91) and the lowest was for descriptive, exploratory and observational research (0.64). The G study showed a moderate research design effect (32%) for scores averaged across all papers. The research design effect was mainly in the Sampling, Results, and Discussion categories (44%, 36%, and 34%, respectively). The scores for research designs showed a majority paper effect for each (53-70%), with small to moderate rater or paper×rater interaction effects (0-27%).
Conclusions: Possible reasons for the research design effect were that the participants were unfamiliar with some of the research designs and that papers were not matched to participants' expertise. Even so, the proposed CAT showed great promise as a tool that can be used across a wide range of research designs.
Copyright Â© 2012 Elsevier Inc. All rights reserved.