Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Mar 16;11(3):e1004075.
doi: 10.1371/journal.pcbi.1004075. eCollection 2015 Mar.

Proportionality: a valid alternative to correlation for relative data

Affiliations

Proportionality: a valid alternative to correlation for relative data

David Lovell et al. PLoS Comput Biol. .

Abstract

In the life sciences, many measurement methods yield only the relative abundances of different components in a sample. With such relative-or compositional-data, differential expression needs careful interpretation, and correlation-a statistical workhorse for analyzing pairwise relationships-is an inappropriate measure of association. Using yeast gene expression data we show how correlation can be misleading and present proportionality as a valid alternative for relative data. We show how the strength of proportionality between two variables can be meaningfully and interpretably described by a new statistic ϕ which can be used instead of correlation as the basis of familiar analyses and visualisation methods, including co-expression networks and clustered heatmaps. While the main aim of this study is to present proportionality as a means to analyse relative data, it also raises intriguing questions about the molecular mechanisms underlying the proportional regulation of a range of yeast genes.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Why correlations between relative abundances tell us absolutely nothing.
These plots show two hypothetical mRNAs that are part of a larger total. (a) Seven pairs of relative abundances (mRNA1/total, mRNA2/total) are shown in red, representing the two mRNAs in seven different experimental conditions. The dotted reference line shows (mRNA1 + mRNA2)/total = 1.) Rays from origin through the red points show absolute abundances that could have given rise to these relative abundances, e.g., the blue, green or purple sets of points (whose Pearson correlations are −1, +1 and 0.0 respectively). (b) Relative abundances that are proportional must come from equivalent absolute abundances. Here the blue, green or purple sets of point pairs have the same proportionality as the pairs of relative abundances in red, though not necessarily the same order or dispersion.
Fig 2
Fig 2. Fission yeast gene expression data of Marguerat et al. (a) Absolute and (b) relative abundances of 3031 yeast mRNAs over a 16-point time course.
y-axes are scaled logarithmically; x-axes are on a square-root scale for clarity. Each grey line represents the expression levels of a particular mRNA. The red and blue pairs of mRNAs are discussed later in this paper.
Fig 3
Fig 3. Correlations between relative abundances bear no relationship to the corresponding correlations between absolute abundances.
(a) The pair of mRNAs labeled in red in Fig. 2, shown on a linear scale. Values have been scaled and translated to have zero mean and unit variance. Upper panels show absolute abundances; the lower show relative abundances. The left panels show mRNA values over time; the right show the value of one mRNA plotted against the other at each time point. The correlation between the relative abundances is almost the complete opposite of that between the absolute abundances of this pair of mRNAs. (b) 2D histogram of the sample correlation coefficient observed for the relative abundances of a given pair of mRNAs, against the correlation observed for the absolute abundances of that same pair, over all pairs. The red and blue points correspond to the red and blue pairs of mRNA in Fig. 2. White contour lines are shown at intervals of 100 counts. The top marginal histogram shows that the absolute abundances of most pairs are very strongly correlated. The right marginal histogram shows “the negative bias difficulty” [4].

Similar articles

Cited by

References

    1. van de Peppel J, Kemmeren P, van Bakel H, Radonjic M, van Leenen D, et al. (2003) Monitoring global messenger RNA changes in externally controlled microarray experiments. EMBO Reports 4: 387–393. doi: 10.1038/sj.embor.embor798 - DOI - PMC - PubMed
    1. Faust K, Sathirapongsasuti JF, Izard J, Segata N, Gevers D, et al. (2012) Microbial co-occurrence relationships in the human microbiome. PLoS Comput Biol 8: e1002606 doi: 10.1371/journal.pcbi.1002606 - DOI - PMC - PubMed
    1. Friedman J, Alm EJ (2012) Inferring correlation networks from genomic survey data. PLoS Comput Biol 8: e1002687 doi: 10.1371/journal.pcbi.1002687 - DOI - PMC - PubMed
    1. Aitchison J (1986) The statistical analysis of compositional data. Chapman & Hall, Ltd; doi: 10.1007/978-94-009-4109-0 - DOI
    1. Pearson K (1897) Mathematical contributions to the theory of evolution—on a form of spurious correlation which may arise when indices are used in the measurement of organs. Proceedings of the Royal Society of London 60 doi: 10.1098/rspl.1896.0076 - DOI

Publication types