Identifying Possible False Matches in Anonymized Hospital Administrative Data without Patient Identifiers

Health Serv Res. 2015 Aug;50(4):1162-78. doi: 10.1111/1475-6773.12272. Epub 2014 Dec 18.


Objective: To identify data linkage errors in the form of possible false matches, where two patients appear to share the same unique identification number.

Data source: Hospital Episode Statistics (HES) in England, United Kingdom.

Study design: Data on births and re-admissions for infants (April 1, 2011 to March 31, 2012; age 0-1 year) and adolescents (April 1, 2004 to March 31, 2011; age 10-19 years).

Data collection/extraction methods: Hospital records pseudo-anonymized using an algorithm designed to link multiple records belonging to the same person. Six implausible clinical scenarios were considered possible false matches: multiple births sharing HESID, re-admission after death, two birth episodes sharing HESID, simultaneous admission at different hospitals, infant episodes coded as deliveries, and adolescent episodes coded as births.

Principal findings: Among 507,778 infants, possible false matches were relatively rare (n = 433, 0.1 percent). The most common scenario (simultaneous admission at two hospitals, n = 324) was more likely for infants with missing data, those born preterm, and for Asian infants. Among adolescents, this scenario (n = 320) was more common for males, younger patients, the Mixed ethnic group, and those re-admitted more frequently.

Conclusions: Researchers can identify clinically implausible scenarios and patients affected, at the data cleaning stage, to mitigate the impact of possible linkage errors.

Keywords: Computerized patient medical records; data linkage; data quality; medical errors.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adolescent
  • Age Factors
  • Child
  • Data Collection / standards*
  • Data Collection / statistics & numerical data*
  • Female
  • Health Services Research
  • Hospital Administration / statistics & numerical data*
  • Humans
  • Infant
  • Infant, Newborn
  • Male
  • Reproducibility of Results
  • Sex Factors
  • Socioeconomic Factors
  • United Kingdom
  • Young Adult