Construction and application of a mass spectral and retention time index database generated from plant GC/EI-TOF-MS metabolite profiles

Phytochemistry. 2003 Mar;62(6):887-900. doi: 10.1016/s0031-9422(02)00703-3.


The non-supervised construction of a mass spectral and retention time index data base (MS/RI library) from a set of plant metabolic profiles covering major organs of potato (Solanum tuberosum), tobacco (Nicotiana tabaccum), and Arabidopsis thaliana, was demonstrated. Typically 300-500 mass spectral components with a signal to noise ratio > or =75 were obtained from GC/EI-time-of-flight (TOF)-MS metabolite profiles of methoxyaminated and trimethylsilylated extracts. Profiles from non-sample controls contained approximately 100 mass spectral components. A MS/RI library of 6205 mass spectral components was accumulated and applied to automated identification of the model compounds galactonic acid, a primary metabolite, and 3-caffeoylquinic acid, a secondary metabolite. Neither MS nor RI alone were sufficient for unequivocal identification of unknown mass spectral components. However library searches with single bait mass spectra of the respective reference substance allowed clear identification by mass spectral match and RI window. Moreover, the hit lists of mass spectral searches were demonstrated to comprise candidate components of highly similar chemical nature. The search for the model compound galactonic acid allowed identification of gluconic and gulonic acid among the top scoring mass spectral components. Equally successful was the exemplary search for 3-caffeoylquinic acid, which led to the identification of quinic acid and of the positional isomers, 4-caffeoylquinic acid, 5-caffeoylquinic acid among other still non-identified conjugates of caffeic and quinic acid. All identifications were verified by co-analysis of reference substances. Finally we applied hierarchical clustering to a complete set of pair-wise mass spectral comparisons of unknown components and reference substances with known chemical structure. We demonstrated that the resulting clustering tree depicted the chemical nature of the reference substances and that most of the nearest neighbours represented either identical components, as judged by co-elution, or conformational isomers exhibiting differential retention behaviour. Unknown components could be classified automatically by grouping with the respective branches and sub-branches of the clustering tree.

MeSH terms

  • Arabidopsis / chemistry*
  • Arabidopsis / metabolism
  • Chromatography, Gas*
  • Databases, Factual*
  • Solanum tuberosum / chemistry*
  • Solanum tuberosum / metabolism
  • Spectrometry, Mass, Electrospray Ionization*
  • Sugar Acids / analysis
  • Time Factors
  • Tobacco / chemistry*
  • Tobacco / metabolism


  • Sugar Acids
  • galactonic acid