Optimal multiwave sampling for regression modeling in two-phase designs

Stat Med. 2020 Dec 30;39(30):4912-4921. doi: 10.1002/sim.8760. Epub 2020 Oct 5.

Abstract

Two-phase designs involve measuring extra variables on a subset of the cohort where some variables are already measured. The goal of two-phase designs is to choose a subsample of individuals from the cohort and analyse that subsample efficiently. It is of interest to obtain an optimal design that gives the most efficient estimates of regression parameters. In this article, we propose a multiwave sampling design to approximate the optimal design for design-based estimators. Influence functions are used to compute the optimal sampling allocations. We propose to use informative priors on regression parameters to derive the wave-1 sampling probabilities because any prespecified sampling probabilities may be far from optimal and decrease the design efficiency. The posterior distributions of the regression parameters derived from the current wave will then be used as priors for the next wave. Generalized raking is used in the final statistical analysis. We show that a two-wave sampling with reasonable informative priors will end up with a highly efficient estimation for the parameter of interest and be close to the underlying optimal design.

Keywords: Neyman allocation; design-based estimators; influence function; optimal design; prior.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cohort Studies
  • Humans
  • Probability
  • Research Design*