A knowledge graph-based disease-gene prediction system using multi-relational graph convolution networks

AMIA Annu Symp Proc. 2023 Apr 29:2022:468-476. eCollection 2022.


Identifying disease-gene associations is important for understanding molecule mechanisms of diseases, finding diagnostic markers and therapeutic targets. Many computational methods have been proposed to predict disease related genes by integrating different biological databases into heterogeneous networks. However, it remains a challenging task to leverage heterogeneous topological and semantic information from multi-source biological data to enhance disease-gene prediction. In this study, we propose a knowledge graph-based disease-gene prediction system (GenePredict-KG) by modeling semantic relations extracted from various genotypic and phenotypic databases. We first constructed a knowledge graph that comprised 2,292,609 associations between 73,358 entities for 14 types of phenotypic and genotypic relations and 7 entity types. We developed a knowledge graph embedding model to learn low-dimensional representations of entities and relations, and utilized these embeddings to infer new disease-gene interactions. We compared GenePredict-KG with several state-of-the-art models using multiple evaluation metrics. GenePredict-KG achieved high performances [AUROC (the area under receiver operating characteristic) = 0.978, AUPR (the area under precision-recall) = 0.343 and MRR (the mean reciprocal rank) = 0.244], outperforming other state-of-art methods.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Benchmarking*
  • Databases, Factual
  • Genotype
  • Humans
  • Knowledge
  • Pattern Recognition, Automated*