After variable selection, standard inferential procedures for regression parameters may not be uniformly valid; there is no finite-sample size at which a standard test is guaranteed to approximately attain its nominal size. This problem is exacerbated in high-dimensional settings, where variable selection becomes unavoidable. This has prompted a flurry of activity in developing uniformly valid hypothesis tests for a low-dimensional regression parameter (eg, the causal effect of an exposure A on an outcome Y) in high-dimensional models. So far there has been limited focus on model misspecification, although this is inevitable in high-dimensional settings. We propose tests of the null that are uniformly valid under sparsity conditions weaker than those typically invoked in the literature, assuming working models for the exposure and outcome are both correctly specified. When one of the models is misspecified, by amending the procedure for estimating the nuisance parameters, our tests continue to be valid; hence, they are doubly robust. Our proposals are straightforward to implement using existing software for penalized maximum likelihood estimation and do not require sample splitting. We illustrate them in simulations and an analysis of data obtained from the Ghent University intensive care unit.
Keywords: causal inference; doubly robust estimation; high-dimensional inference; post-selection inference.
© 2020 The International Biometric Society.