Efficient Differentially Private Methods for a Transmission Disequilibrium Test in Genome Wide Association Studies

Pac Symp Biocomput. 2022:27:85-96.

Abstract

To achieve the provision of personalized medicine, it is very important to investigate the relationship between diseases and human genomes. For this purpose, large-scale genetic studies such as genome-wide association studies are often conducted, but there is a risk of identifying individuals if the statistics are released as they are. In this study, we propose new efficient differentially private methods for a transmission disequilibrium test, which is a family-based association test. Existing methods are computationally intensive and take a long time even for a small cohort. Moreover, for approximation methods, sensitivity of the obtained values is not guaranteed. We present an exact algorithm with a time complexity of 𝒪(nm) for a dataset containing n families and m single nucleotide polymorphisms (SNPs). We also propose an approximation algorithm that is faster than the exact one and prove that the obtained scores' sensitivity is 1. From our experimental results, we demonstrate that our exact algorithm is 10, 000 times faster than existing methods for a small cohort with 5, 000 SNPs. The results also indicate that the proposed method is the first in the world that can be applied to a large cohort, such as those with 106 SNPs. In addition, we examine a suitable dataset to apply our approximation algorithm. Supplementary materials are available at https://github.com/ay0408/DP-trio-TDT.

MeSH terms

  • Algorithms
  • Computational Biology*
  • Genome, Human
  • Genome-Wide Association Study*
  • Humans
  • Linkage Disequilibrium
  • Polymorphism, Single Nucleotide