Large, Sparse Optimal Matching with Refined Covariate Balance in an Observational Study of the Health Outcomes Produced by New Surgeons

J Am Stat Assoc. 2015 Apr 3;110(510):515-527. doi: 10.1080/01621459.2014.997879.

Abstract

Every newly trained surgeon performs her first unsupervised operation. How do the health outcomes of her patients compare with the patients of experienced surgeons? Using data from 498 hospitals, we compare 1252 pairs comprised of a new surgeon and an experienced surgeon working at the same hospital. We introduce a new form of matching that matches patients of each new surgeon to patients of an otherwise similar experienced surgeon at the same hospital, perfectly balancing 176 surgical procedures and closely balancing a total of 2.9 million categories of patients; additionally, the individual patient pairs are as close as possible. A new goal for matching is introduced, called "refined covariate balance," in which a sequence of nested, ever more refined, nominal covariates is balanced as closely as possible, emphasizing the first or coarsest covariate in that sequence. A new algorithm for matching is proposed and the main new results prove that the algorithm finds the closest match in terms of the total within-pair covariate distances among all matches that achieve refined covariate balance. Unlike previous approaches to forcing balance on covariates, the new algorithm creates multiple paths to a match in a network, where paths that introduce imbalances are penalized and hence avoided to the extent possible. The algorithm exploits a sparse network to quickly optimize a match that is about two orders of magnitude larger than is typical in statistical matching problems, thereby permitting much more extensive use of fine and near-fine balance constraints. The match was constructed in a few minutes using a network optimization algorithm implemented in R. An R package called rcbalance implementing the method is available from CRAN.

Keywords: Fine balance; network optimization; optimal matching; sparse networks.