The propensity score is a balancing score: conditional on the propensity score, treated and untreated subjects have the same distribution of observed baseline characteristics. Four methods of using the propensity score have been described in the literature: stratification on the propensity score, propensity score matching, inverse probability of treatment weighting using the propensity score, and covariate adjustment using the propensity score. However, the relative ability of these methods to reduce systematic differences between treated and untreated subjects has not been examined. The authors used an empirical case study and Monte Carlo simulations to examine the relative ability of the 4 methods to balance baseline covariates between treated and untreated subjects. They used standardized differences in the propensity score matched sample and in the weighted sample. For stratification on the propensity score, within-quintile standardized differences were computed comparing the distribution of baseline covariates between treated and untreated subjects within the same quintile of the propensity score. These quintile-specific standardized differences were then averaged across the quintiles. For covariate adjustment, the authors used the weighted conditional standardized absolute difference to compare balance between treated and untreated subjects. In both the empirical case study and in the Monte Carlo simulations, they found that matching on the propensity score and weighting using the inverse probability of treatment eliminated a greater degree of the systematic differences between treated and untreated subjects compared with the other 2 methods. In the Monte Carlo simulations, propensity score matching tended to have either comparable or marginally superior performance compared with propensity-score weighting.