Imputing Missing Race/Ethnicity in Pediatric Electronic Health Records: Reducing Bias with Use of U.S. Census Location and Surname Data

Health Serv Res. 2015 Aug;50(4):946-60. doi: 10.1111/1475-6773.12295. Epub 2015 Mar 11.


Objective: To assess the utility of imputing race/ethnicity using U.S. Census race/ethnicity, residential address, and surname information compared to standard missing data methods in a pediatric cohort.

Data sources/study setting: Electronic health record data from 30 pediatric practices with known race/ethnicity.

Study design: In a simulation experiment, we constructed dichotomous and continuous outcomes with pre-specified associations with known race/ethnicity. Bias was introduced by nonrandomly setting race/ethnicity to missing. We compared typical methods for handling missing race/ethnicity (multiple imputation alone with clinical factors, complete case analysis, indicator variables) to multiple imputation incorporating surname and address information.

Principal findings: Imputation using U.S. Census information reduced bias for both continuous and dichotomous outcomes.

Conclusions: The new method reduces bias when race/ethnicity is partially, nonrandomly missing.

Keywords: Multiple imputation; U.S. Census location and surname data; health disparities; race and ethnicity.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Adolescent
  • Age Factors
  • Asthma / ethnology
  • Attention Deficit Disorder with Hyperactivity / ethnology
  • Bias
  • Black or African American / statistics & numerical data
  • Censuses*
  • Child
  • Child, Preschool
  • Data Collection / methods*
  • Electronic Health Records / statistics & numerical data*
  • Ethnicity / statistics & numerical data*
  • Female
  • Hispanic or Latino / statistics & numerical data
  • Humans
  • Infant
  • Infant, Newborn
  • Male
  • Names
  • Racial Groups / statistics & numerical data*
  • Research Design
  • Sex Factors
  • Socioeconomic Factors
  • United States