Evaluation of matched control algorithms in EHR-based phenotyping studies: a case study of inflammatory bowel disease comorbidities

J Biomed Inform. 2014 Dec:52:105-11. doi: 10.1016/j.jbi.2014.08.012. Epub 2014 Sep 6.


The success of many population studies is determined by proper matching of cases to controls. Some of the confounding and bias that afflict electronic health record (EHR)-based observational studies may be reduced by creating effective methods for finding adequate controls. We implemented a method to match case and control populations to compensate for sparse and unequal data collection practices common in EHR data. We did this by matching the healthcare utilization of patients after observing that more complete data was collected on high healthcare utilization patients vs. low healthcare utilization patients. In our results, we show that many of the anomalous differences in population comparisons are mitigated using this matching method compared to other traditional age and gender-based matching. As an example, the comparison of the disease associations of ulcerative colitis and Crohn's disease show differences that are not present when the controls are chosen in a random or even a matched age/gender/race algorithm. In conclusion, the use of healthcare utilization-based matching algorithms to find adequate controls greatly enhanced the accuracy of results in EHR studies. Full source code and documentation of the control matching methods is available at https://community.i2b2.org/wiki/display/conmat/.

Keywords: Comorbidity; Controls; EHR; Inflammatory bowel disease; Matching.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Case-Control Studies
  • Comorbidity*
  • Electronic Health Records / classification*
  • Humans
  • Inflammatory Bowel Diseases / epidemiology*
  • Medical Informatics / methods*
  • Patient Acceptance of Health Care