Survey on graph embeddings and their applications to machine learning problems on graphs

PeerJ Comput Sci. 2021 Feb 4:7:e357. doi: 10.7717/peerj-cs.357. eCollection 2021.

Abstract

Dealing with relational data always required significant computational resources, domain expertise and task-dependent feature engineering to incorporate structural information into a predictive model. Nowadays, a family of automated graph feature engineering techniques has been proposed in different streams of literature. So-called graph embeddings provide a powerful tool to construct vectorized feature spaces for graphs and their components, such as nodes, edges and subgraphs under preserving inner graph properties. Using the constructed feature spaces, many machine learning problems on graphs can be solved via standard frameworks suitable for vectorized feature representation. Our survey aims to describe the core concepts of graph embeddings and provide several taxonomies for their description. First, we start with the methodological approach and extract three types of graph embedding models based on matrix factorization, random-walks and deep learning approaches. Next, we describe how different types of networks impact the ability of models to incorporate structural and attributed data into a unified embedding. Going further, we perform a thorough evaluation of graph embedding applications to machine learning problems on graphs, among which are node classification, link prediction, clustering, visualization, compression, and a family of the whole graph embedding algorithms suitable for graph classification, similarity and alignment problems. Finally, we overview the existing applications of graph embeddings to computer science domains, formulate open problems and provide experiment results, explaining how different networks properties result in graph embeddings quality in the four classic machine learning problems on graphs, such as node classification, link prediction, clustering and graph visualization. As a result, our survey covers a new rapidly growing field of network feature engineering, presents an in-depth analysis of models based on network types, and overviews a wide range of applications to machine learning problems on graphs.

Keywords: Geometric deep learning; Graph embedding; Graph neural networks; Graph visualization; Knowledge representation; Link prediction; Machine learning; Network science; Node classification; Node clustering.

Grants and funding

The work of Nikita Nikitinsky on Section 6 was supported by the Russian Science Foundation grant 19-11-00281. The OA fee was covered under support of Faculty of Computer Science, HSE University. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.