Bridging heterogeneous mutation data to enhance disease gene discovery

Brief Bioinform. 2021 Sep 2;22(5):bbab079. doi: 10.1093/bib/bbab079.


Bridging heterogeneous mutation data fills in the gap between various data categories and propels discovery of disease-related genes. It is known that genome-wide association study (GWAS) infers significant mutation associations that link genotype and phenotype. However, due to the differences of size and quality between GWAS studies, not all de facto vital variations are able to pass the multiple testing. In the meantime, mutation events widely reported in literature unveil typical functional biological process, including mutation types like gain of function and loss of function. To bring together the heterogeneous mutation data, we propose a 'Gene-Disease Association prediction by Mutation Data Bridging (GDAMDB)' pipeline with a statistic generative model. The model learns the distribution parameters of mutation associations and mutation types and recovers false-negative GWAS mutations that fail to pass significant test but represent supportive evidences of functional biological process in literature. Eventually, we applied GDAMDB in Alzheimer's disease (AD) and predicted 79 AD-associated genes. Besides, 12 of them from the original GWAS, 60 of them are supported to be AD-related by other GWAS or literature report, and rest of them are newly predicted genes. Our model is capable of enhancing the GWAS-based gene association discovery by well combining text mining results. The positive result indicates that bridging the heterogeneous mutation data is contributory for the novel disease-related gene discovery.

Keywords: Alzheimer’s disease; GWAS; data fusion; generative model; text mining.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Alzheimer Disease / genetics*
  • Computational Biology / methods
  • Data Mining / methods
  • Gene Regulatory Networks / genetics
  • Genetic Association Studies / methods*
  • Genetic Predisposition to Disease / genetics*
  • Genome-Wide Association Study / methods*
  • Genotype
  • Humans
  • Mutation*
  • Phenotype
  • Polymorphism, Single Nucleotide*
  • Protein Interaction Maps / genetics
  • Reproducibility of Results