Improving annotation propagation on molecular networks through random walks: introducing ChemWalker

Bioinformatics. 2023 Mar 1;39(3):btad078. doi: 10.1093/bioinformatics/btad078.

Abstract

Motivation: Annotation of the mass signals is still the biggest bottleneck for the untargeted mass spectrometry analysis of complex mixtures. Molecular networks are being increasingly adopted by the mass spectrometry community as a tool to annotate large-scale experiments. We have previously shown that the process of propagating annotations from spectral library matches on molecular networks can be automated using Network Annotation Propagation (NAP). One of the limitations of NAP is that the information for the spectral matches is only propagated locally, to the first neighbor of a spectral match. Here, we show that annotation propagation can be expanded to nodes not directly connected to spectral matches using random walks on graphs, introducing the ChemWalker python library.

Results: Similarly to NAP, ChemWalker relies on combinatorial in silico fragmentation results, performed by MetFrag, searching biologically relevant databases. Departing from the combination of a spectral network and the structural similarity among candidate structures, we have used MetFusion Scoring function to create a weight function, producing a weighted graph. This graph was subsequently used by the random walk to calculate the probability of 'walking' through a set of candidates, departing from seed nodes (represented by spectral library matches). This approach allowed the information propagation to nodes not directly connected to the spectral library match. Compared with NAP, ChemWalker has a series of improvements, on running time, scalability and maintainability and is available as a standalone python package.

Availability and implementation: ChemWalker is freely available at https://github.com/computational-chemical-biology/ChemWalker.

Contact: ridasilva@usp.br.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Databases, Factual
  • Gene Library
  • Libraries*
  • Mass Spectrometry
  • Probability