Accurately modeling biased random walks on weighted networks using node2vec

Renming Liu; Matthew Hirn; Arjun Krishnan

doi:10.1093/bioinformatics/btad047

Accurately modeling biased random walks on weighted networks using node2vec

Bioinformatics. 2023 Jan 1;39(1):btad047. doi: 10.1093/bioinformatics/btad047.

Authors

Renming Liu¹, Matthew Hirn^{1

2

3}, Arjun Krishnan^{1

4}

Affiliations

¹ Department of Computational Mathematics, Science & Engineering, Michigan State University, East Lansing, MI 48824, USA.
² Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA.
³ Center for Quantum Computing, Science & Engineering, Michigan State University, East Lansing, MI 48824, USA.
⁴ Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA.

Abstract

Motivation: Accurately representing biological networks in a low-dimensional space, also known as network embedding, is a critical step in network-based machine learning and is carried out widely using node2vec, an unsupervised method based on biased random walks. However, while many networks, including functional gene interaction networks, are dense, weighted graphs, node2vec is fundamentally limited in its ability to use edge weights during the biased random walk generation process, thus under-using all the information in the network.

Results: Here, we present node2vec+, a natural extension of node2vec that accounts for edge weights when calculating walk biases and reduces to node2vec in the cases of unweighted graphs or unbiased walks. Using two synthetic datasets, we empirically show that node2vec+ is more robust to additive noise than node2vec in weighted graphs. Then, using genome-scale functional gene networks to solve a wide range of gene function and disease prediction tasks, we demonstrate the superior performance of node2vec+ over node2vec in the case of weighted graphs. Notably, due to the limited amount of training data in the gene classification tasks, graph neural networks such as GCN and GraphSAGE are outperformed by both node2vec and node2vec+.

Availability and implementation: The data and code are available on GitHub at https://github.com/krishnanlab/node2vecplus_benchmarks. All additional data underlying this article are available on Zenodo at https://doi.org/10.5281/zenodo.7007164.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Epistasis, Genetic
Gene Regulatory Networks
Machine Learning*
Neural Networks, Computer*
Phenotype

Abstract

Publication types

MeSH terms

Grants and funding