Accurately modeling biased random walks on weighted networks using node2vec

Bioinformatics. 2023 Jan 1;39(1):btad047. doi: 10.1093/bioinformatics/btad047.

Abstract

Motivation: Accurately representing biological networks in a low-dimensional space, also known as network embedding, is a critical step in network-based machine learning and is carried out widely using node2vec, an unsupervised method based on biased random walks. However, while many networks, including functional gene interaction networks, are dense, weighted graphs, node2vec is fundamentally limited in its ability to use edge weights during the biased random walk generation process, thus under-using all the information in the network.

Results: Here, we present node2vec+, a natural extension of node2vec that accounts for edge weights when calculating walk biases and reduces to node2vec in the cases of unweighted graphs or unbiased walks. Using two synthetic datasets, we empirically show that node2vec+ is more robust to additive noise than node2vec in weighted graphs. Then, using genome-scale functional gene networks to solve a wide range of gene function and disease prediction tasks, we demonstrate the superior performance of node2vec+ over node2vec in the case of weighted graphs. Notably, due to the limited amount of training data in the gene classification tasks, graph neural networks such as GCN and GraphSAGE are outperformed by both node2vec and node2vec+.

Availability and implementation: The data and code are available on GitHub at https://github.com/krishnanlab/node2vecplus_benchmarks. All additional data underlying this article are available on Zenodo at https://doi.org/10.5281/zenodo.7007164.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Epistasis, Genetic
  • Gene Regulatory Networks
  • Machine Learning*
  • Neural Networks, Computer*
  • Phenotype