Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Oct 3;89(19):10397-10406.
doi: 10.1021/acs.analchem.7b02380. Epub 2017 Sep 15.

Systems-Level Annotation of a Metabolomics Data Set Reduces 25 000 Features to Fewer Than 1000 Unique Metabolites

Affiliations
Free PMC article

Systems-Level Annotation of a Metabolomics Data Set Reduces 25 000 Features to Fewer Than 1000 Unique Metabolites

Nathaniel G Mahieu et al. Anal Chem. .
Free PMC article

Abstract

When using liquid chromatography/mass spectrometry (LC/MS) to perform untargeted metabolomics, it is now routine to detect tens of thousands of features from biological samples. Poor understanding of the data, however, has complicated interpretation and masked the number of unique metabolites actually being measured in an experiment. Here we place an upper bound on the number of unique metabolites detected in Escherichia coli samples analyzed with one untargeted metabolomics method. We first group multiple features arising from the same analyte, which we call "degenerate features", using a context-driven annotation approach. Surprisingly, this analysis revealed thousands of previously unreported degeneracies that reduced the number of unique analytes to ∼2961. We then applied an orthogonal approach to remove nonbiological features from the data using the 13C-based credentialing technology. This further reduced the number of unique analytes to less than 1000. Our 90% reduction in data is 5-fold greater than previously published studies. On the basis of the results, we propose an alternative approach to untargeted metabolomics that relies on thoroughly annotated reference data sets. To this end, we introduce the creDBle database ( http://creDBle.wustl.edu ), which contains accurate mass, retention time, and MS/MS fragmentation data as well as annotations of all credentialed features.

Figures

Figure 1.
Figure 1.
Our informatic workflow. Raw data were processed with in-house algorithms to first identify high-quality, consensus features (i.e., recurring features between replicates) and discriminate against processing artifacts. This consensus data set was further characterized by mz.unity (to estimate signal degeneracy) and credentialing (to estimate contaminants and artifacts). The resulting annotated data set was catalogued in the creDBle database.
Figure 2.
Figure 2.
Plotting the maximum number of unique analytes detected throughout the steps of our annotation process. (A) Removal of features occurring in the blank. (B) Features are grouped as additional relationships are annotated. This reduces the maximum number of unique analytes. When a feature group contains multiple features, it is shown in green. When a feature group contains only a single feature (i.e., is a singlet), then it is shown in pink. Relationships from left to right: no relationships; isotopes; charge carriers; neutral losses; complex dimers (single and multi-analyte dimers); frequent intrinsic relationships; situational adducts (background). (C) Similar annotation of features that were credentialed.
Figure 3.
Figure 3.
Detection of frequent intrinsic relationships. (A) The Gaussian kernel density of all pairwise peak relationships in the data set. Inset is a zoomed-in section around 14 Da. Known relationships are labeled with a formula. Unknown relationships are labeled with mass and charge transitions [m, z]. (B) Peak pairs of the recovered frequent intrinsic relationship [23.0760, 0] plotted in mass/charge and retention time (points). Line segments connect pairs with the specified spacing.
Figure 4.
Figure 4.
Situational adducts. (A) The persistent background spectrum observed in this experiment. The three indicated background peaks have mass spacings that correspond to a methylene group. These are likely an alkyl amine series with carbon numbers 5, 6, and 7. When these background species adduct with an analyte, situational adducts are formed. (B) An example of a situational adduct formed between background ion 102.1280 (a six carbon alkyl amine) and an eluting analyte. This process likely occurs with all three alkyl amine species throughout the run, giving rise to the frequent intrinsic relationships of mass 14.0157 (see Table 1, Row 8).
Figure 5.
Figure 5.
Schematic showing how background ions give rise to frequent intrinsic relationships. Analyte A is detected as an adduct of each background ion (B1 and B2). The spacing between the adducts (A+B1-H and A+B2-H) is equal to the spacing between the background ions.

Similar articles

See all similar articles

Cited by 34 articles

See all "Cited by" articles

Publication types

Feedback