Doubly robust inference when combining probability and non-probability samples with high dimensional data

J R Stat Soc Series B Stat Methodol. 2020 Apr;82(2):445-465. doi: 10.1111/rssb.12354. Epub 2020 Jan 7.

Abstract

We consider integrating a non-probability sample with a probability sample which provides high dimensional representative covariate information of the target population. We propose a two-step approach for variable selection and finite population inference. In the first step, we use penalized estimating equations with folded concave penalties to select important variables and show selection consistency for general samples. In the second step, we focus on a doubly robust estimator of the finite population mean and re-estimate the nuisance model parameters by minimizing the asymptotic squared bias of the doubly robust estimator. This estimating strategy mitigates the possible first-step selection error and renders the doubly robust estimator root n consistent if either the sampling probability or the outcome model is correctly specified.

Keywords: Data integration; Double robustness; Generalizability; Penalized estimating equation; Variable selection.