Inflation of type I error rates due to differential misclassification in EHR-derived outcomes: Empirical illustration using breast cancer recurrence

Pharmacoepidemiol Drug Saf. 2019 Feb;28(2):264-268. doi: 10.1002/pds.4680. Epub 2018 Oct 30.


Purpose: Many outcomes derived from electronic health records (EHR) not only are imperfect but also may suffer from exposure-dependent differential misclassification due to variability in the quality and availability of EHR data across exposure groups. The objective of this study was to quantify the inflation of type I error rates that can result from differential outcome misclassification.

Methods: We used data on gold-standard and EHR-derived second breast cancers in a cohort of women with a prior breast cancer diagnosis from 1993 to 2006 enrolled in Kaiser Permanente Washington. We simulated an exposure that was independent of the true outcome status. A surrogate outcome was then simulated with varying sensitivity and specificity according to exposure status. We estimated the type I error rate for a test of association relating this exposure to the surrogate outcome, while varying outcome sensitivity and specificity in exposed individuals.

Results: Type I error rates were substantially inflated above the nominal level (5%) for even modest departures from nondifferential misclassification. Holding sensitivity in exposed and unexposed groups at 85%, a difference in specificity of 10% between the exposed and unexposed (80% vs 90%) resulted in a 36% type I error rate. Type I error was inflated more by differential specificity than sensitivity.

Conclusions: Differential outcome misclassification may induce spurious findings. Researchers using EHR-derived outcomes should use misclassification-adjusted methods whenever possible or conduct sensitivity analyses to investigate the possibility of false-positive findings, especially for exposures that may be related to the accuracy of outcome ascertainment.

Keywords: electronic health record; misclassification; outcome; pharmacoepidemiology; phenotype; validation.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aged
  • Bias
  • Breast Neoplasms / epidemiology*
  • Cohort Studies
  • Computer Simulation
  • Data Accuracy
  • Data Interpretation, Statistical
  • Electronic Health Records / statistics & numerical data*
  • Female
  • Humans
  • Middle Aged
  • Models, Statistical
  • Neoplasm Recurrence, Local / epidemiology*
  • Outcome Assessment, Health Care / methods
  • Outcome Assessment, Health Care / statistics & numerical data*
  • Pharmacoepidemiology / methods
  • Pharmacoepidemiology / statistics & numerical data*
  • Sensitivity and Specificity
  • Washington / epidemiology