Linking mothers and infants within electronic health records: a comparison of deterministic and probabilistic algorithms

Pharmacoepidemiol Drug Saf. 2015 Jan;24(1):45-51. doi: 10.1002/pds.3728. Epub 2014 Nov 18.


Purpose: To compare probabilistic and deterministic algorithms for linking mothers and infants within electronic health records (EHRs) to support pregnancy outcomes research.

Methods: The study population was women enrolled in Group Health (Washington State, USA) delivering a liveborn infant from 2001 through 2008 (N = 33,093 deliveries) and infant members born in these years. We linked women to infants by surname, address, and dates of birth and delivery using deterministic and probabilistic algorithms. In a subset previously linked using "gold standard" identifiers (N = 14,449), we assessed each approach's sensitivity and positive predictive value (PPV). For deliveries with no "gold standard" linkage (N = 18,644), we compared the algorithms' linkage proportions. We repeated our analyses in an independent test set of deliveries from 2009 through 2013. We reviewed medical records to validate a sample of pairs apparently linked by one algorithm but not the other (N = 51 or 1.4% of discordant pairs).

Results: In the 2001-2008 "gold standard" population, the probabilistic algorithm's sensitivity was 84.1% (95% CI, 83.5-84.7) and PPV 99.3% (99.1-99.4), while the deterministic algorithm had sensitivity 74.5% (73.8-75.2) and PPV 95.7% (95.4-96.0). In the test set, the probabilistic algorithm again had higher sensitivity and PPV. For deliveries in 2001-2008 with no "gold standard" linkage, the probabilistic algorithm found matched infants for 58.3% and the deterministic algorithm, 52.8%. On medical record review, 100% of linked pairs appeared valid.

Conclusions: A probabilistic algorithm improved linkage proportion and accuracy compared to a deterministic algorithm. Better linkage methods can increase the value of EHRs for pregnancy outcomes research.

Keywords: medical record linkage; pharmacoepidemiology; pregnancy outcome/epidemiology.

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Adolescent
  • Adult
  • Algorithms*
  • Delivery of Health Care / standards
  • Delivery of Health Care / statistics & numerical data
  • Electronic Health Records / standards*
  • Electronic Health Records / statistics & numerical data
  • Female
  • Humans
  • Infant Welfare* / statistics & numerical data
  • Infant, Newborn
  • Maternal Welfare* / statistics & numerical data
  • Medical Record Linkage / standards*
  • Mothers
  • Pregnancy
  • Young Adult