High quality (certainty) evidence changes less often than low-quality evidence, but the magnitude of effect size does not systematically differ between studies with low versus high-quality evidence

J Eval Clin Pract. 2022 Jun;28(3):353-362. doi: 10.1111/jep.13657. Epub 2022 Jan 28.


Rationale, aims, and objectives: It is generally believed that evidence from low quality of evidence generate inaccurate estimates about treatment effects more often than evidence from high (certainty) quality evidence (CoE). As a result, we would expect that (a) estimates of effects of health interventions initially based on high CoE change less frequently than the effects estimated by lower CoE (b) the estimates of magnitude of effect size differ between high and low CoE. Empirical assessment of these foundational principles of evidence-based medicine has been lacking.

Methods: We reviewed the Cochrane Database of Systematic Reviews from January 2016 through May 2021 for pairs of original and updated reviews for change in CoE assessments based on the Grading of Recommendations Assessment, Development and Evaluation (GRADE) method. We assessed the difference in effect sizes between the original versus updated reviews as a function of change in CoE, which we report as a ratio of odds ratio (ROR). We compared ROR generated in the studies in which CoE changed from very low/low (VL/L) to moderate/high (M/H) versus M/H to VL/L. Heterogeneity and inconsistency were assessed using the tau and I2 statistic. We also assessed the change in precision of effect estimates (by calculating the ratio of standard errors) (seR), and the absolute deviation in estimates of treatment effects (aROR).

Results: Four hundred and nineteen pairs of reviews were included of which 414 (207 × 2) informed the CoE appraisal and 384 (192 × 2) the assessment of effect size. We found that CoE originally appraised as VL/L had 2.1 [95% confidence interval (CI): 1.19-4.12; p = 0.0091] times higher odds to be changed in the future studies than M/H CoE. However, the effect size was not different (p = 1) when CoE changed from VL/L → M/H [ROR = 1.02 (95% CI: 0.74-1.39)] compared with M/H → VL/L (ROR = 1.02 [95% CI: 0.44-2.37]). Similar overlap in aROR between the VL/L → M/H versus M/H → VL/L subgroups was observed [median (IQR): 1.12 (1.07-1.57) vs. 1.21 (1.12-2.43)]. We observed large inconsistency across ROR estimates (I2 = 99%). There was larger imprecision in treatment effects when CoE changed from VL/L → M/H (seR = 1.46) than when it changed from M/H → VL/L (seR = 0.72).

Conclusions: We found that low-quality evidence changes more often than high CoE. However, the effect size did not systematically differ between the studies with low versus high CoE. The finding that the effect size did not differ between low and high CoE indicate urgent need to refine current EBM critical appraisal methods.

Keywords: critical appraisal-bias; evidence-based medicine; meta-epidemiology; observational studies; random error; randomized trials; systematic review.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Humans
  • Systematic Reviews as Topic*