Objective: To assess the degree of agreement between propensity score studies and randomized clinical trials in critical care research.
Data sources: Propensity score studies published in highly cited critical care or general medicine journals or included in a previous systematic review; corresponding randomized clinical trials included in Cochrane Systematic Reviews or published in PubMed.
Study selection: We identified propensity score studies of the effects of therapeutic interventions on short- or long-term mortality. We systematically matched propensity score studies to randomized clinical trials based on patient selection criteria, interventions, and outcomes.
Data extraction: We appraised the methods of included studies and extracted treatment effect estimates to compare the results of propensity score studies and randomized clinical trials. When multiple studies were identified for the same topic, we performed meta-analyses to obtain summary treatment effect estimates.
Data synthesis: We matched 21 propensity score studies with 58 randomized clinical trials in 18 distinct comparisons (median, one propensity score study and two randomized clinical trials per comparison), for short- and long-term mortality. We found one statistically significant difference between designs (hyperoncotic albumin vs crystalloid fluids) among these 18 comparisons. Propensity score studies did not produce systematically higher (or lower) treatment effect estimates compared with randomized clinical trials, but estimates from the two designs differed by more than 30% in one third of the comparisons examined. Observational studies in critical care met widely accepted methodological standards for propensity score analyses.
Conclusions: Across diverse critical care topics, propensity score studies published in high-impact journals produced results that were generally consistent with the findings of randomized clinical trials. However, caution is needed when interpreting propensity score studies because occasionally their results contradict those of randomized clinical trials and there is no reliable way to predict disagreements.