Tradeoffs between accuracy measures for electronic health care data algorithms

J Clin Epidemiol. 2012 Mar;65(3):343-349.e2. doi: 10.1016/j.jclinepi.2011.09.002. Epub 2011 Dec 23.


Objective: We review the uses of electronic health care data algorithms, measures of their accuracy, and reasons for prioritizing one measure of accuracy over another.

Study design and setting: We use real studies to illustrate the variety of uses of automated health care data in epidemiologic and health services research. Hypothetical examples show the impact of different types of misclassification when algorithms are used to ascertain exposure and outcome.

Results: High algorithm sensitivity is important for reducing the costs and burdens associated with the use of a more accurate measurement tool, for enhancing study inclusiveness, and for ascertaining common exposures. High specificity is important for classifying outcomes. High positive predictive value is important for identifying a cohort of persons with a condition of interest but that need not be representative of or include everyone with that condition. Finally, a high negative predictive value is important for reducing the likelihood that study subjects have an exclusionary condition.

Conclusion: Epidemiologists must often prioritize one measure of accuracy over another when generating an algorithm for use in their study. We recommend researchers publish all tested algorithms-including those without acceptable accuracy levels-to help future studies refine and apply algorithms that are well suited to their objectives.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Aged
  • Algorithms*
  • Bias
  • Clinical Coding
  • Databases, Factual / standards
  • Databases, Factual / statistics & numerical data*
  • Electronic Health Records*
  • Epidemiologic Methods*
  • Female
  • Health Services Research / statistics & numerical data*
  • Humans
  • Sensitivity and Specificity