Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Feb 2;100(2):316-322.
doi: 10.1016/j.ajhg.2016.12.002. Epub 2017 Jan 5.

Expanding Access to Large-Scale Genomic Data While Promoting Privacy: A Game Theoretic Approach

Affiliations
Free PMC article

Expanding Access to Large-Scale Genomic Data While Promoting Privacy: A Game Theoretic Approach

Zhiyu Wan et al. Am J Hum Genet. .
Free PMC article

Abstract

Emerging scientific endeavors are creating big data repositories of data from millions of individuals. Sharing data in a privacy-respecting manner could lead to important discoveries, but high-profile demonstrations show that links between de-identified genomic data and named persons can sometimes be reestablished. Such re-identification attacks have focused on worst-case scenarios and spurred the adoption of data-sharing practices that unnecessarily impede research. To mitigate concerns, organizations have traditionally relied upon legal deterrents, like data use agreements, and are considering suppressing or adding noise to genomic variants. In this report, we use a game theoretic lens to develop more effective, quantifiable protections for genomic data sharing. This is a fundamentally different approach because it accounts for adversarial behavior and capabilities and tailors protections to anticipated recipients with reasonable resources, not adversaries with unlimited means. We demonstrate this approach via a new public resource with genomic summary data from over 8,000 individuals-the Sequence and Phenotype Integration Exchange (SPHINX)-and show that risks can be balanced against utility more effectively than with traditional approaches. We further show the generalizability of this framework by applying it to other genomic data collection and sharing endeavors. Recognizing that such models are dependent on a variety of parameters, we perform extensive sensitivity analyses to show that our findings are robust to their fluctuations.

Keywords: Electronic Medical Records and Genomics Network; Sequence and Phenotype Integration Exchange; adversarial modeling; game theory; genetic algorithm; genomic data privacy; genomic data sharing policy; re-identification risk; sensitivity analysis; summary statistics.

Figures

Figure 1
Figure 1
A Comparison of Genomic Summary Data Sharing Policies for Participants in the SPHINX Program The compared policies include (1) the single-nucleotide polymorphism (SNP) suppression policies, which rely only on hiding of genomic regions (blue dots), (2) the existing SNP suppression policy, according to Sankararaman et al.’s approach (red circle), (3) the data use agreement (DUA) policy, which relies only on a legally enforceable contract (gold square), (4) the game theoretic policy, which allows for a combination of a DUA and SNP suppression in a Stackelberg framework (brown triangle), (5) the no-risk game theoretic policy, which ensures no attack is committed by the recipient (green outlined triangle), and (6) the no SNP suppression policy, whcih illustrates what transpires when no DUA or SNP suppression is applied (purple circle). Utility is directly related to the absolute difference between the minor allele frequencies of shared SNPs in the study and their known minor allele frequencies in the underlying reference population (a utility score of 1 is achieved when all SNPs are shared). Privacy is inversely related to risk, the likelihood a recipient achieves success in compromising the privacy protection of targeted individuals (a privacy score of 1 is achieved when no attacks are successful—in other words, when no risk exists). A higher payoff value represents a more desirable option. SPHINX, Sequence and Phenotype Integration Exchange.
Figure 2
Figure 2
The Genomic Data Sharing Process In this process, a genomic data sharing policy is made by the sharer (A), a recipient chooses to attack targets in received data (B), and the overall payoffs as a consequence are shown (C). SNP, single-nucleotide polymorphism; DUA, data use agreement.
Figure 3
Figure 3
Comparisons of Four Protection Policies for the SPHINX Program with a Varying Penalty against the Genomic Inference Attack The compared policies include (1) the optimal game theoretic solution (brown lines), (2) the game theoretic solution that ensures no attack is successful (black lines), (3) the data use agreement (DUA) (yellow lines), and (4) the SNP suppression solution (blue lines) with no penalty. The overall payoff (the main graph on the right) is the result of combining (1) the privacy protection afforded to the targeted individuals (the upper graph on the left) and (2) the utility in the set of SNPs that are shared (the lower graph on the left). SPHINX, Sequence and Phenotype Integration Exchange.
Figure 4
Figure 4
Comparisons of Four Protection Policies for a Range of Genomic Data Sharing Programs with Varying Prior Probabilities against the Genomic Inference Attack The compared policies include (1) the optimal game theoretic solution (brown bars filled with downward diagonal pattern), (2) the game theoretic solution that ensures no attack is successful (black bars with no fill), (3) the data use agreement (DUA) (gold bars filled with checkerboard pattern), and (4) the single-nucleotide polymorphism (SNP) suppression solution (blue bars with solid fill). The overall payoff (the main graph on the right) is the result of combining (1) the privacy protection afforded to the targeted individuals (the upper graph on the left) and (2) the utility in the set of SNPs that are shared (the lower graph on the left). PMI, Precision Medicine Initiative; MVP, Million Veteran Program; SPHINX, Sequence and Phenotype Integration Exchange; BioVU, de-identified biorepository of Vanderbilt University Medical Center; RDCRN, Rare Diseases Clinical Research Network.

Similar articles

See all similar articles

Cited by 8 articles

See all "Cited by" articles

Publication types

Feedback