Objectives: The objectives of this paper were: a) to determine what can be learned from conclusions of systematic reviews about the evidence base of medicine; and b) to determine whether two readers draw similar conclusions from the same review, and whether these match the authors' conclusions.
Methods: Three methodologists (two per review) rated 160 Cochrane systematic reviews (issue 1, 1998) using pre-established conclusion categories. Disagreements were resolved by discussion to arrive at a consensual score for each review. Reviews' authors were asked to use the same categories to designate the intended conclusion. Interrater agreements were calculated.
Results: Interrater agreement between two readers was 0.68 and 0.72, and between readers and authors, 0.32. The largest categories assigned by methodologists were "positive effect" (22.5%), "insufficient evidence" (21.3%), and "evidence of no effect" (20.0%). The largest categories assigned by authors were "insufficient evidence" (32.4%), "possibly positive" (28.6%), and "positive effect" (26.7%).
Conclusions: The number of reviews indicating that the modern biomedical interventions show either no effect or insufficient evidence is surprisingly high. Interrater disagreements suggest a surprising degree of subjective interpretation involved in systematic reviews. Where patterns of disagreement emerged between authors and readers, authors tended to be more optimistic in their conclusions than the readers. Policy implications are discussed.