Heterogeneous networks integration for disease-gene prioritization with node kernels

Van Dinh Tran; Alessandro Sperduti; Rolf Backofen; Fabrizio Costa

doi:10.1093/bioinformatics/btaa008

Heterogeneous networks integration for disease-gene prioritization with node kernels

Bioinformatics. 2020 May 1;36(9):2649-2656. doi: 10.1093/bioinformatics/btaa008.

Authors

Van Dinh Tran¹, Alessandro Sperduti², Rolf Backofen^{1

3}, Fabrizio Costa⁴

Affiliations

¹ Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg im Breisgau, Germany.
² Department of Mathematics, University of Padova, Padua, Italy.
³ Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Germany.
⁴ Department of Computer Science, University of Exeter, Exeter, UK.

PMID: 31990289
DOI: 10.1093/bioinformatics/btaa008

Abstract

Motivation: The identification of disease-gene associations is a task of fundamental importance in human health research. A typical approach consists in first encoding large gene/protein relational datasets as networks due to the natural and intuitive property of graphs for representing objects' relationships and then utilizing graph-based techniques to prioritize genes for successive low-throughput validation assays. Since different types of interactions between genes yield distinct gene networks, there is the need to integrate different heterogeneous sources to improve the reliability of prioritization systems.

Results: We propose an approach based on three phases: first, we merge all sources in a single network, then we partition the integrated network according to edge density introducing a notion of edge type to distinguish the parts and finally, we employ a novel node kernel suitable for graphs with typed edges. We show how the node kernel can generate a large number of discriminative features that can be efficiently processed by linear regularized machine learning classifiers. We report state-of-the-art results on 12 disease-gene associations and on a time-stamped benchmark containing 42 newly discovered associations.

Availability and implementation: Source code: https://github.com/dinhinfotech/DiGI.git.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Gene Regulatory Networks*
Humans
Proteins
Reproducibility of Results
Software*

Substances

Proteins