A map of mass spectrometry-based in silico fragmentation prediction and compound identification in metabolomics

Brief Bioinform. 2021 Mar 24;bbab073. doi: 10.1093/bib/bbab073. Online ahead of print.


Metabolomics, the comprehensive study of the metabolome, and lipidomics-the large-scale study of pathways and networks of cellular lipids-are major driving forces in enabling personalized medicine. Complicated and error-prone data analysis still remains a bottleneck, however, especially for identifying novel metabolites. Comparing experimental mass spectra to curated databases containing reference spectra has been the gold standard for identification of compounds, but constructing such databases is a costly and time-demanding task. Many software applications try to circumvent this process by utilizing cutting-edge advances in computational methods-including quantum chemistry and machine learning-and simulate mass spectra by performing theoretical, so called in silico fragmentations of compounds. Other solutions concentrate directly on experimental spectra and try to identify structural properties by investigating reoccurring patterns and the relationships between them. The considerable progress made in the field allows recent approaches to provide valuable clues to expedite annotation of experimental mass spectra. This review sheds light on individual strengths and weaknesses of these tools, and attempts to evaluate them-especially in view of lipidomics, when considering complex mixtures found in biological samples as well as mass spectrometer inter-instrument variability.

Keywords: machine learning; mass spectrometry; metabolomics; quantum chemistry.