Probabilistic record linkage: relationships between file sizes, identifiers and match weights

Methods Inf Med. 2001 Jul;40(3):196-203.


This study investigates relationships between file sizes, amounts of information contained in commonly used record linkage variables, and the amount of information needed for a successful probabilistic linkage project. We present an equation predicting the amount of information needed for a successful linkage project. Match weights for variables commonly used in record linkage are measured using artificially created databases. Linkage algorithms were successful when the sum of minimum weights for variables used in a linkage exceeded the predicted cutoff. Linkage results were acceptable when this sum was near the predicted cutoff. This technique enables researchers to determine if enough information exists to perform a successful probabilistic linkage.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Confidentiality*
  • Humans
  • Medical Record Linkage*
  • Probability*