Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Apr 18;14(4):e1006089.
doi: 10.1371/journal.pcbi.1006089. eCollection 2018 Apr.

Propagating Annotations of Molecular Networks Using in Silico Fragmentation

Free PMC article

Propagating Annotations of Molecular Networks Using in Silico Fragmentation

Ricardo R da Silva et al. PLoS Comput Biol. .
Free PMC article


The annotation of small molecules is one of the most challenging and important steps in untargeted mass spectrometry analysis, as most of our biological interpretations rely on structural annotations. Molecular networking has emerged as a structured way to organize and mine data from untargeted tandem mass spectrometry (MS/MS) experiments and has been widely applied to propagate annotations. However, propagation is done through manual inspection of MS/MS spectra connected in the spectral networks and is only possible when a reference library spectrum is available. One of the alternative approaches used to annotate an unknown fragmentation mass spectrum is through the use of in silico predictions. One of the challenges of in silico annotation is the uncertainty around the correct structure among the predicted candidate lists. Here we show how molecular networking can be used to improve the accuracy of in silico predictions through propagation of structural annotations, even when there is no match to a MS/MS spectrum in spectral libraries. This is accomplished through creating a network consensus of re-ranked structural candidates using the molecular network topology and structural similarity to improve in silico annotations. The Network Annotation Propagation (NAP) tool is accessible through the GNPS web-platform

Conflict of interest statement

The authors have declared that no competing interests exist.


Fig 1
Fig 1. Representative scenarios of molecular networks obtained in an untargeted MS/MS experiment and possibilities for propagation.
a) Introduction of molecular networking and library matching. b, c and d represent varying degree of spectral annotation in the network. e, f and g illustrate how the propagation of annotations can be used for each respective scenario (represented in the top panel). e) The Fusion scoring—The spectral library hit nodes (red) are employed to recalculate the score of candidate structures (grey shapes associated to nodes) for nodes having structure candidates from in silico fragmentation search (blue), based on their structural similarity (Represented by the green heatmaps, where darker green indicates a higher degree of similarity). f) and g) The Consensus scoring—a Consensus scoring can be used, based on the joint similarity of neighbors (pink nodes) for spectral library hits and in silico annotations (f), or in silico annotation only, when no library match is present (g).
Fig 2
Fig 2. NAP re-ranking assessment using the 5,467 NIST17 [M+H]+ benchmark data set that have known nearest neighbors in the molecular network.
a) The impact of setting the n-first parameter on percentage of correct annotations for network Consensus scoring. Where n-first indicates the n number of top ranked candidate structures (from 5 to 20) considered from the neighbor nodes during the Consensus scoring. b) Percentage of correct annotations ranking for each method. c) Number of spectra with improved ranking of correct annotations, that is, for which Fusion or Consensus scoring ranked the correct structure better than MetFrag.
Fig 3
Fig 3. Schematic representation of the molecular structure candidates clustering by structural similarity and dynamic cluster assignment from the 5467 NIST17 [M+H]+ unique compound spectra with ClassyFire chemical taxonomy.
The structurally related group of candidate structures, detected by unsupervised clustering, containing the first candidates ranked by network Fusion and network Consensus often contains the maximum common substructure shared between in silico candidates (for candidates inside the group defined by clustering) and the known structure in the validation dataset (numbers shown in the bottom). The structurally related groups are highlighted by colors, inside each group we also show class assignments. The classes/structures that were predicted by unsupervised clustering were compared with the known compound.
Fig 4
Fig 4. N-acetyl-sugar metabolite family: Annotations of library match to N-acetylgalactosamine propagates through network.
N-acetylglucosamine containing top-ranked candidates are represented in larger boxes. a) Result from MetFrag b) Result from network Consensus. Highlighted in green is the maximum common substructure (MCSS) of each node to the reference library (green border node) for the seven N-acetylglucosamine related molecules.
Fig 5
Fig 5. Network annotation of the E. dendroides plant extracts with NAP and visualized in Cytoscape with ChemViz2 plug-in.
a) Result of the spectral library annotation using public spectral libraries available on GNPS. b) MetFrag annotation with top scoring molecules from bio-database. c) NAP annotation with top scoring matches using NAP network Fusion scoring, showing candidate lists associated to two nodes.
Fig 6
Fig 6. Network annotation of the T. septentrionalis fungus gardens extracts with NAP and visualized in Cytoscape with ChemViz2 plug-in.
a) Group of nodes in which the annotation can be directly propagated from spectral library matches (top) and other in which the inspection of neighbor candidate structures can improve the annotation (bottom). Blue background nodes represent the presence of candidate structures from in silico annotation and green background represents candidates from spectral library annotation. b) MetFrag top ranked candidates. c) NAP consensus top ranked candidates. d) MetFrag top ranked candidates. e) NAP consensus top ranked candidates.

Similar articles

See all similar articles

Cited by 13 articles

See all "Cited by" articles


    1. Khedr A, El-Hay SSA, Kammoun AK. Liquid chromatography-tandem mass spectrometric determination of propofol in rat serum and hair at attogram level after derivatization with 3-bromomethyl-propyphenazone. J Pharm Biomed Anal [Internet]. 2017. February 5 [cited 2018 Jan 26];134:195–202. Available from: doi: 10.1016/j.jpba.2016.11.051 - DOI - PubMed
    1. Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res [Internet]. 1999. January 1 [cited 2012 Jul 10];27(1):29–34. Available from: - PMC - PubMed
    1. da Silva RR, Dorrestein PC, Quinn RA. Illuminating the dark matter in metabolomics. Proc Natl Acad Sci [Internet]. 2015. October 1 [cited 2015 Oct 2];201516878. Available from:
    1. Dealing with the Unknown: Metabolomics and Metabolite Atlases. J Am Soc Mass Spectrom [Internet]. 2010. September 1 [cited 2017 Oct 31];21(9):1471–6. Available from: doi: 10.1016/j.jasms.2010.04.003 - DOI - PubMed
    1. Wang M, Carver JJ, Phelan V V, Sanchez LM, Garg N, Peng Y, et al. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nat Biotechnol [Internet]. 2016. August 9 [cited 2017 Mar 11];34(8):828–37. Available from: doi: 10.1038/nbt.3597 - DOI - DOI - PMC - PubMed

Publication types