Bayesian Estimation of Propensity Scores for Integrating Multiple Cohorts with High-Dimensional Covariates

Stat Biosci. 2024 Dec 9:10.1007/s12561-024-09470-5. doi: 10.1007/s12561-024-09470-5. Online ahead of print.

Abstract

Comparative meta-analyses of groups of subjects by integrating multiple observational studies rely on estimated propensity scores (PSs) to mitigate covariate imbalances. However, PS estimation grapples with the theoretical and practical challenges posed by high-dimensional covariates. Motivated by an integrative analysis of breast cancer patients across seven medical centers, this paper tackles the challenges of integrating multiple observational datasets. The proposed inferential technique, called Bayesian Motif Submatrices for Covariates (B-MSC), addresses the curse of dimensionality by a hybrid of Bayesian and frequentist approaches. B-MSC uses nonparametric Bayesian "Chinese restaurant" processes to eliminate redundancy in the high-dimensional covariates and discover latent motifs or lower-dimensional structures. With these motifs as potential predictors, standard regression techniques can be utilized to accurately infer the PSs and facilitate covariate-balanced group comparisons. Simulations and meta-analysis of the motivating cancer investigation demonstrate the efficacy of the B-MSC approach to accurately estimate the propensity scores and efficiently address covariate imbalance when integrating observational health studies with high-dimensional covariates.

Keywords: B-MSC; Covariate imbalance; Data integration; High-dimensional covariates; Hybrid Bayesian-frequentist.