Measuring correlations in metabolomic networks with mutual information

Genome Inform. 2008:20:112-22.

Abstract

Non-linear correlations based on mutual information are evaluated to measure statistical dependencies among data points measured from metabolism in two dimensional space. While the Pearson correlation coefficient is only rigorously applicable to characterize strictly linear correlations with Gaussian noise, the mutual information coefficient is more generally valid. Here, we use recent distribution-free (non-parametric) mutual information estimators based on k-nearest neighbor distances. The mutual information algorithm of Kraskov et al. is found to yield estimates with low systematic and statistical error. The significance of the different methods is probed for artificial sets of tens to hundreds of data points, a size currently typical for metabolomic data. We analyze experimental data on metabolite concentrations from Arabidopsis thaliana by using these procedures. The mutual information was able to detect additional non-linear correlations undetectable for the Pearson coefficient.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computer Simulation
  • False Positive Reactions
  • Metabolism / genetics
  • Metabolome / genetics*
  • Models, Genetic
  • Models, Statistical
  • Probability
  • Reproducibility of Results
  • Sample Size
  • Statistics, Nonparametric
  • Systems Biology / methods*