Bias due to Berkson error: issues when using predicted values in place of observed covariates

Biostatistics. 2021 Oct 13;22(4):858-872. doi: 10.1093/biostatistics/kxaa002.


Studies often want to test for the association between an unmeasured covariate and an outcome. In the absence of a measurement, the study may substitute values generated from a prediction model. Justification for such methods can be found by noting that, with standard assumptions, this is equivalent to fitting a regression model for an outcome variable when at least one covariate is measured with Berkson error. Under this setting, it is known that consistent or nearly consistent inference can be obtained under many linear and nonlinear outcome models. In this article, we focus on the linear regression outcome model and show that this consistency property does not hold when there is unmeasured confounding in the outcome model, in which case the marginal inference based on a covariate measured with Berkson error differs from the same inference based on observed covariates. Since unmeasured confounding is ubiquitous in applications, this severely limits the practical use of such measurements, and, in particular, the substitution of predicted values for observed covariates. These issues are illustrated using data from the National Health and Nutrition Examination Survey to study the joint association of total percent body fat and body mass index with HbA1c. It is shown that using predicted total percent body fat in place of observed percent body fat yields inferences which often differ significantly, in some cases suggesting opposite relationships among covariates.

Keywords: Asymptotic bias; Berkson error model; Measurement error; Prediction equations; Unmeasured confounding.

Publication types

  • Research Support, N.I.H., Intramural

MeSH terms

  • Bias
  • Body Mass Index
  • Humans
  • Linear Models
  • Nutrition Surveys*