Probabilistic record linkage is a valid and transparent tool to combine databases without a patient identification number

J Clin Epidemiol. 2007 Sep;60(9):883-91. doi: 10.1016/j.jclinepi.2006.11.021. Epub 2007 May 17.


Objective: To describe the technical approach and subsequent validation of the probabilistic linkage of the three anonymous, population-based Dutch Perinatal Registries (LVR1 of midwives, LVR2 of obstetricians, and LNR of pediatricians/neonatologists). These registries do not share a unique identification number.

Study design and setting: A combination of probabilistic and deterministic record linkage techniques were applied using information about the mother, delivery, and child(ren) to link three known registries. Rewards for agreement and penalties for disagreement between corresponding variables were calculated based on the observed patterns of agreement and disagreements using maximum likelihood estimation. Special measures were developed to overcome linking difficulties in twins. A subsample of linked and nonlinked pairs was validated.

Results: Independent validation confirmed that the procedure successfully linked the three Dutch perinatal registries despite nontrivial error rates in the linking variables.

Conclusions: Probabilistic linkage techniques allowed the creation of a high-quality linked database from crude registry data. The developed procedures are generally applicable in linkage of health data with partially identifying information. They provide useful source date even if cohorts are only partly overlapping and if within the cohort, multiple entities and twins exist.

Publication types

  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Female
  • Humans
  • Infant, Newborn
  • Information Storage and Retrieval*
  • Medical Record Linkage / methods*
  • Medical Records Systems, Computerized
  • Netherlands
  • Patient Identification Systems
  • Pregnancy
  • Probability
  • Public Health Informatics*
  • Registries