In Silico Structure Predictions for Non-targeted Analysis: From Physicochemical Properties to Molecular Structures

J Am Soc Mass Spectrom. 2022 Jul 6;33(7):1134-1147. doi: 10.1021/jasms.1c00386. Epub 2022 Jun 1.


While important advances have been made in high-resolution mass spectrometry (HRMS) and its applications in non-targeted analysis (NTA), the number of identified compounds in biological and environmental samples often does not exceed 5% of the detected chemical features. Our aim was to develop a computational pipeline that leverages data from HRMS but also incorporates physicochemical properties (equilibrium partition ratios between organic solvents and water; Ksolvent-water) and can propose molecular structures for detected chemical features. As these physicochemical properties are often sufficiently different across isomers, when put together, they can form a unique profile for each isomer, which we describe as the "physicochemical fingerprint". In our study, we used a comprehensive database of compounds that have been previously reported in human blood and collected their Ksolvent-water values for 129 partitioning systems. We used RDKit to calculate the number of RDKit fragments and the number of RDKit bits per molecule. We then developed and trained an artificial neural network, which used as an input the physicochemical fingerprint of a chemical feature and predicted the number and types of RDKit fragments and RDKit bits present in that structure. These were then used to search the database and propose chemical structures. The average success rate of predicting the right chemical structure ranged from 60 to 86% for the training set and from 48 to 81% for the testing set. These observations suggest that physicochemical fingerprints can assist in the identification of compounds with NTA and substantially improve the number of identified compounds.

MeSH terms

  • Humans
  • Isomerism
  • Molecular Structure
  • Solvents / chemistry
  • Water* / chemistry


  • Solvents
  • Water