An Open Source Tool for Game Theoretic Health Data De-Identification

AMIA Annu Symp Proc. 2018 Apr 16;2017:1430-1439. eCollection 2017.


Biomedical data continues to grow in quantity and quality, creating new opportunities for research and data-driven applications. To realize these activities at scale, data must be shared beyond its initial point of collection. To maintain privacy, healthcare organizations often de-identify data, but they assume worst-case adversaries, inducing high levels of data corruption. Recently, game theory has been proposed to account for the incentives of data publishers and recipients (who attempt to re-identify patients), but this perspective has been more hypothetical than practical. In this paper, we report on a new game theoretic data publication strategy and its integration into the open source software ARX. We evaluate our implementation with an analysis on the relationship between data transformation, utility, and efficiency for over 30,000 demographic records drawn from the U.S. Census Bureau. The results indicate that our implementation is scalable and can be combined with various data privacy risk and quality measures.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Censuses
  • Confidentiality
  • Data Accuracy
  • Data Anonymization*
  • Game Theory*
  • Humans
  • Software*
  • United States