Reducing patient re-identification risk for laboratory results within research datasets

J Am Med Inform Assoc. 2013 Jan 1;20(1):95-101. doi: 10.1136/amiajnl-2012-001026. Epub 2012 Jul 21.


Objective: To try to lower patient re-identification risks for biomedical research databases containing laboratory test results while also minimizing changes in clinical data interpretation.

Materials and methods: In our threat model, an attacker obtains 5-7 laboratory results from one patient and uses them as a search key to discover the corresponding record in a de-identified biomedical research database. To test our models, the existing Vanderbilt TIME database of 8.5 million Safe Harbor de-identified laboratory results from 61 280 patients was used. The uniqueness of unaltered laboratory results in the dataset was examined, and then two data perturbation models were applied-simple random offsets and an expert-derived clinical meaning-preserving model. A rank-based re-identification algorithm to mimic an attack was used. The re-identification risk and the retention of clinical meaning for each model's perturbed laboratory results were assessed.

Results: Differences in re-identification rates between the algorithms were small despite substantial divergence in altered clinical meaning. The expert algorithm maintained the clinical meaning of laboratory results better (affecting up to 4% of test results) than simple perturbation (affecting up to 26%).

Discussion and conclusion: With growing impetus for sharing clinical data for research, and in view of healthcare-related federal privacy regulation, methods to mitigate risks of re-identification are important. A practical, expert-derived perturbation algorithm that demonstrated potential utility was developed. Similar approaches might enable administrators to select data protection scheme parameters that meet their preferences in the trade-off between the protection of privacy and the retention of clinical meaning of shared data.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Biomedical Research
  • Clinical Laboratory Information Systems*
  • Computer Security*
  • Confidentiality*
  • Electronic Health Records*
  • Feasibility Studies
  • Humans
  • Information Dissemination
  • Medical Record Linkage*
  • United States