A flexible approach to distributed data anonymization

J Biomed Inform. 2014 Aug;50:62-76. doi: 10.1016/j.jbi.2013.12.002. Epub 2013 Dec 12.


Sensitive biomedical data is often collected from distributed sources, involving different information systems and different organizational units. Local autonomy and legal reasons lead to the need of privacy preserving integration concepts. In this article, we focus on anonymization, which plays an important role for the re-use of clinical data and for the sharing of research data. We present a flexible solution for anonymizing distributed data in the semi-honest model. Prior to the anonymization procedure, an encrypted global view of the dataset is constructed by means of a secure multi-party computing (SMC) protocol. This global representation can then be anonymized. Our approach is not limited to specific anonymization algorithms but provides pre- and postprocessing for a broad spectrum of algorithms and many privacy criteria. We present an extensive analytical and experimental evaluation and discuss which types of methods and criteria are supported. Our prototype demonstrates the approach by implementing k-anonymity, ℓ-diversity, t-closeness and δ-presence with a globally optimal de-identification method in horizontally and vertically distributed setups. The experiments show that our method provides highly competitive performance and offers a practical and flexible solution for anonymizing distributed biomedical datasets.

Keywords: Anonymization; Commutative encryption; Distribution; Personal data protection; Privacy; SMC; Secure multi-party computing.

MeSH terms

  • Algorithms
  • Medical Records Systems, Computerized*
  • Models, Theoretical
  • Privacy*