Controlling for Informed Presence Bias Due to the Number of Health Encounters in an Electronic Health Record

Am J Epidemiol. 2016 Dec 1;184(11):847-855. doi: 10.1093/aje/kww112. Epub 2016 Nov 16.


Electronic health records (EHRs) are an increasingly utilized resource for clinical research. While their size allows for many analytical opportunities, as with most observational data there is also the potential for bias. One of the key sources of bias in EHRs is what we term informed presence-the notion that inclusion in an EHR is not random but rather indicates that the subject is ill, making people in EHRs systematically different from those not in EHRs. In this article, we use simulated and empirical data to illustrate the conditions under which such bias can arise and how conditioning on the number of health-care encounters can be one way to remove this bias. In doing so, we also show when such an approach can impart M bias, or bias from conditioning on a collider. Finally, we explore the conditions under which number of medical encounters can serve as a proxy for general health. We apply these methods to an EHR data set from a university medical center covering the years 2007-2013.

Keywords: Berkson's bias; bias (epidemiology); confounding factors (epidemiology); electronic health records; epidemiologic methods.

MeSH terms

  • Biomedical Research / methods*
  • Biomedical Research / standards*
  • Computer Simulation
  • Confounding Factors, Epidemiologic
  • Depression / epidemiology
  • Diabetes Mellitus / epidemiology
  • Electronic Health Records / statistics & numerical data*
  • Epidemiologic Research Design*
  • Health Services / statistics & numerical data
  • Health Status
  • Humans
  • Reproducibility of Results
  • Selection Bias*