Development of Ecom₅₀ and retention index models for nontargeted metabolomics: identification of 1,3-dicyclohexylurea in human serum by HPLC/mass spectrometry

J Chem Inf Model. 2012 May 25;52(5):1222-37. doi: 10.1021/ci300092s. Epub 2012 Apr 27.


The goal of many metabolomic studies is to identify the molecular structure of endogenous molecules that are differentially expressed among sampled or treatment groups. The identified compounds can then be used to gain an understanding of disease mechanisms. Unfortunately, despite recent advances in a variety of analytical techniques, small molecule (<1000 Da) identification remains difficult. Rarely can a chemical structure be determined from experimental "features" such as retention time, exact mass, and collision induced dissociation spectra. Thus, without knowing structure, biological significance remains obscure. In this study, we explore an identification method in which the measured exact mass of an unknown is used to query available chemical databases to compile a list of candidate compounds. Predictions are made for the candidates using models of experimental features that have been measured for the unknown. The predicted values are used to filter the candidate list by eliminating compounds with predicted values substantially different from the unknown. The intent is to reduce the list of candidates to a reasonable number that can be obtained and measured for confirmation. To facilitate this exploration, we measured data and created models for two experimental features; MS Ecom₅₀ (the energy in electronvolts required to fragment 50% of a selected precursor ion) and HPLC retention index. Using a data set of 52 compounds, Ecom₅₀ models were developed based on both Molconn and CODESSA structural descriptors. These models gave r² values of 0.89 to 0.94 depending on the number of inputs, the modeling algorithm chosen, and whether neutral or protonated structures were used. The retention index model was developed with 400 compounds using a back-propagation artificial neural network and 33 Molconn structure descriptors. External validation gave a v² = 0.87 and standard error of 38 retention index units. As a test of the validity of the filtering approach, the Ecom₅₀ and retention index models, along with exact mass and collision induced dissociation spectra matching, were used to identify 1,3-dicyclohexylurea in human plasma. This compound was not previously known to exist in human biofluids and its elemental formula was identical to 315 other candidate compounds downloaded from PubChem. These results suggest that the use of Ecom₅₀ and retention index predictive models can improve nontargeted metabolite structure identification using HPLC/MS derived structural features.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Chromatography, High Pressure Liquid*
  • Databases, Factual
  • Humans
  • Mass Spectrometry*
  • Metabolomics / methods*
  • Models, Biological*
  • Urea / analogs & derivatives*
  • Urea / blood
  • Urea / chemistry


  • Urea
  • 1,3-dicyclohexylurea