Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Randomized Controlled Trial
. 2013 Feb;22(2):138-44.
doi: 10.1002/pds.3396. Epub 2012 Dec 28.

Investigating differences in treatment effect estimates between propensity score matching and weighting: a demonstration using STAR*D trial data

Affiliations
Randomized Controlled Trial

Investigating differences in treatment effect estimates between propensity score matching and weighting: a demonstration using STAR*D trial data

Alan R Ellis et al. Pharmacoepidemiol Drug Saf. 2013 Feb.

Abstract

Purpose: The choice of propensity score (PS) implementation influences treatment effect estimates not only because different methods estimate different quantities, but also because different estimators respond in different ways to phenomena such as treatment effect heterogeneity and limited availability of potential matches. Using effectiveness data, we describe lessons learned from sensitivity analyses with matched and weighted estimates.

Methods: With subsample data (N = 1292) from Sequenced Treatment Alternatives to Relieve Depression, a 2001-2004 effectiveness trial of depression treatments, we implemented PS matching and weighting to estimate the treatment effect in the treated and conducted multiple sensitivity analyses.

Results: Matching and weighting both balanced covariates but yielded different samples and treatment effect estimates (matched RR 1.00, 95% CI: 0.75-1.34; weighted RR 1.28, 95% CI: 0.97-1.69). In sensitivity analyses, as increasing numbers of observations at both ends of the PS distribution were excluded from the weighted analysis, weighted estimates approached the matched estimate (weighted RR 1.04, 95% CI 0.77-1.39 after excluding all observations below the 5th percentile of the treated and above the 95th percentile of the untreated). Treatment appeared to have benefits only in the highest and lowest PS strata.

Conclusions: Matched and weighted estimates differed due to incomplete matching, sensitivity of weighted estimates to extreme observations, and possibly treatment effect heterogeneity. PS analysis requires identifying the population and treatment effect of interest, selecting an appropriate implementation method, and conducting and reporting sensitivity analyses. Weighted estimation especially should include sensitivity analyses relating to influential observations, such as those treated contrary to prediction.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest statement: In the past 5 years AE has received research funding from Merck and from the Center for Pharmacoepidemiology at the UNC Gillings School of Global Public Health, which receives industry funding. SD received funding through a Ruth L. Kirschstein-National Service Research Award Post-Doctoral Traineeship sponsored by NIMH and Harvard Medical School, Department of Health Care Policy, Grant No. T32MH01973; she has no conflict of interest to report for this paper. RH has received research support during the last 5 years from NIH and AHRQ. He also has research and consulting support from Takeda Pharmaceuticals, GlaxoSmithKline, and Novartis. Over the past 5 years, BG has received grant and research support from Agency for Healthcare Research and Quality, NIMH, Bristol Myers Squibb, Novartis, and M-3 Information. He has performed as an advisor for Bristol Myers Squibb. Over the past 5 years, JF has received unrestricted grant support from the Pfizer Foundation and consulting fees from Takeda Pharmaceuticals and Novartis Pharmaceuticals. TS receives investigator-initiated research funding and support as Principal Investigator (R01 AG023178) and Co-Investigator (R01 AG018833) from the National Institute on Aging at the National Institutes of Health. He also receives research funding as Principal Investigator of the UNC-DEcIDE center from the Agency for Healthcare Research and Quality. TS does not accept personal compensation of any kind from any pharmaceutical company, though he receives salary support from the Center for Pharmacoepidemiology and from unrestricted research grants from pharmaceutical companies to UNC.

Figures

Figure 1
Figure 1. Propensity Score Distribution in the Augment and Switch Groups Before Propensity Score Application, After Matching, and After Weighting
The figure, based on kernel density estimation, shows distributions of propensity scores estimated using logistic regression. Horizontal axes indicate ranges of propensity score values. Weights are for the treatment effect in the medication augmentation group.
Figure 2
Figure 2. Sensitivity of Weighted Estimates to Truncation of Weights and to Exclusion of Observations with Extreme Propensity Scores
CI=confidence interval; RR=risk ratio. The figure shows RRs for remission based on estimated propensity scores that were applied using standardized mortality ratio weights. Percentile cutoffs are relative to the propensity score distribution in the medication switch group. Truncation means that weights below the lower cutoff were increased to equal the lower cutoff, and weights above the higher cutoff were decreased to equal the higher cutoff. Exclusion means that observations outside the indicated range were omitted from the analysis. Box size indicates relative precision. Lines indicate 95% confidence intervals.
Figure 3
Figure 3. Heterogeneity of Treatment Effect
CI=confidence interval; RR=risk ratio. The figure shows RRs for remission after restricting to the common support region and stratifying by the estimated propensity score. Box size indicates relative precision. Lines indicate 95% confidence intervals.

Similar articles

Cited by

References

    1. Shadish W, Cook T, Campbell D. Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton-Mifflin; 2002.
    1. Stürmer T, Joshi M, Glynn R, Avorn J, Rothman K, Schneeweiss S. A review of the application of propensity score methods yielded increasing use, advantages in specific settings, but not substantially different estimates compared with conventional multivariable methods. J Clin Epidemiol. 2006;59(5):437–447. - PMC - PubMed
    1. Rosenbaum P, Rubin D. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983 Apr 1;70(1):41–55.
    1. Greenland S, Robins J. Identifiability, exchangeability, and epidemiological confounding. Int J Epidemiol. 1986;15(3):413–419. - PubMed
    1. Greenland S, Robins J. Identifiability, exchangeability and confounding revisited. Epidemiol Perspect Innov. 2009;6(4) - PMC - PubMed

Publication types

MeSH terms

Substances