Objective: Integration of patients' records across resources enhances analytics. To address privacy concerns, emerging strategies such as Bloom filter encodings (BFEs), enable integration while obscuring identifiers. However, recent investigations demonstrate BFEs are, in theory, vulnerable to cryptanalysis when encoded identifiers are randomly selected from a public resource. This study investigates the extent to which cryptanalysis conditions hold for (1) real patient records and (2) a countermeasure that obscures the frequencies of the identifying values in encoded datasets.
Design: First, to investigate the strength of cryptanalysis for real patient records, we build BFEs from identifiers in an electronic medical record system and apply cryptanalysis using identifiers in a publicly available voter registry. Second, to investigate the countermeasure under ideal cryptanalysis conditions, we compose BFEs from the identifiers that are randomly selected from a public voter registry.
Measurement: We utilize precision (ie, rate of correct re-identified encodings) and computation efficiency (ie, time to complete cryptanalysis) to assess the performance of cryptanalysis in BFEs before and after application of the countermeasure.
Results: Cryptanalysis can achieve high precision when the encoded identifiers are composed of a random sample of a public resource (ie, a voter registry). However, we also find that the attack is less efficient and may not be practical for more realistic scenarios. By contrast, the proposed countermeasure made cryptanalysis impractical in terms of precision and efficiency.
Conclusions: Performance of cryptanalysis against BFEs based on patient data is significantly lower than theoretical estimates. The proposed countermeasure makes BFEs resistant to known practical attacks.