Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 May 15;33(10):1545-1553.
doi: 10.1093/bioinformatics/btx012.

Sparse Network Modeling and Metscape-Based Visualization Methods for the Analysis of Large-Scale Metabolomics Data

Affiliations
Free PMC article

Sparse Network Modeling and Metscape-Based Visualization Methods for the Analysis of Large-Scale Metabolomics Data

Sumanta Basu et al. Bioinformatics. .
Free PMC article

Abstract

Motivation: Recent technological advances in mass spectrometry, development of richer mass spectral libraries and data processing tools have enabled large scale metabolic profiling. Biological interpretation of metabolomics studies heavily relies on knowledge-based tools that contain information about metabolic pathways. Incomplete coverage of different areas of metabolism and lack of information about non-canonical connections between metabolites limits the scope of applications of such tools. Furthermore, the presence of a large number of unknown features, which cannot be readily identified, but nonetheless can represent bona fide compounds, also considerably complicates biological interpretation of the data.

Results: Leveraging recent developments in the statistical analysis of high-dimensional data, we developed a new Debiased Sparse Partial Correlation algorithm (DSPC) for estimating partial correlation networks and implemented it as a Java-based CorrelationCalculator program. We also introduce a new version of our previously developed tool Metscape that enables building and visualization of correlation networks. We demonstrate the utility of these tools by constructing biologically relevant networks and in aiding identification of unknown compounds.

Availability and implementation: http://metscape.med.umich.edu.

Supplementary information: Supplementary data are available at Bioinformatics online.

Figures

Fig. 1
Fig. 1
Correlation analysis workflow. Metabolites with experimental measurements can be uploaded into CorrelationCalculator. The program can perform basic normalization and Pearson’s correlation analysis. A subset of data selected by setting a Pearson’s correlation coefficient threshold or the entire data set can be passed to DSPC. The results can be downloaded in tab-delimited format and sent directly to Metscape
Fig. 2
Fig. 2
Evaluation of PCOR and DSPC networks. (A) The relative sizes (number of nodes) of the PCOR and DSPC networks of decreasing sample size are shown. (B) Proportion of all edges recovered by PCOR and DSPC. (C) Proportion of the 10% most significant edges recovered by PCOR and DSPC
Fig. 3
Fig. 3
Validation of the DSPC algorithm. (A) Network built using basic partial correlation algorithm with 1020 samples; (B) Network built using DSPC with 1020 samples; (C) Network built using basic partial correlation algorithm with 200 samples; (D) Network built using DSPC with 200 samples. Metabolites are colored according to classes. Both methods perform well when the number of samples is large. DSPC can recover more significant edges when the number of samples is reduced
Fig. 4
Fig. 4
Partial correlation network of T1D differentiating metabolites. Node size indicates the direction of the change. Bold black border indicates significant metabolites. Colored edges had P-value< 0.003, and FDR adjusted P-value< 0.5. Dotted lines represent edges with P-values < 0.2. Red and blue edges show positive and negative correlations
Fig. 5
Fig. 5
DSPC amino acid network. The network was constructed using the targeted and untargeted data. Nodes representing compounds measured in targeted amino acid assay are shown in pink. Nodes that represent compounds measured using the untargeted RP platform have blue and red borders for those detected in negative and positive modes respectively. Known compounds are shown as hexagons, whereas diamond- shaped nodes represent the unknown features, most of which were identified as adducts and in-source fragments of the highly correlated known compounds as shown in Supplementary Figure 6 for valine and proline subnetworks
Fig. 6
Fig. 6
Identification of unknown compounds. (A, D) Correlation networks containing both known metabolites and unknown features. (B, E) Overlaid extracted ion chromatograms showing identical retention times and peak shapes of the features in the plasma samples and in the spiked samples. (C, F) The mass spectra of the spiked and un-spiked samples closely matched the predicted isotope distribution computed using the molecular formula of the assigned metabolites

Similar articles

See all similar articles

Cited by 16 articles

See all "Cited by" articles
Feedback