A causal inference perspective on the analysis of compositional data

Int J Epidemiol. 2020 Aug 1;49(4):1307-1313. doi: 10.1093/ije/dyaa021.


Background: Compositional data comprise the parts of some whole, for which all parts sum to that whole. They are prevalent in many epidemiological contexts. Although many of the challenges associated with analysing compositional data have been discussed previously, we do so within a formal causal framework by utilizing directed acyclic graphs (DAGs).

Methods: We depict compositional data using DAGs and identify two distinct effect estimands in the generic case: (i) the total effect, and (ii) the relative effect. We consider each in the context of three specific example scenarios involving compositional data: (1) the relationship between the economically active population and area-level gross domestic product; (2) the relationship between fat consumption and body weight; and (3) the relationship between time spent sedentary and body weight. For each, we consider the distinct interpretation of each effect, and the resulting implications for related analyses.

Results: For scenarios (1) and (2), both the total and relative effects may be identifiable and causally meaningful, depending upon the specific question of interest. For scenario (3), only the relative effect is identifiable. In all scenarios, the relative effect represents a joint effect, and thus requires careful interpretation.

Conclusions: DAGs are useful for considering causal effects for compositional data. In all analyses involving compositional data, researchers should explicitly consider and declare which causal effect is sought and how it should be interpreted.

Keywords: Compositional data; causal inference; collider bias; directed acyclic graphs; joint effects; relative effects.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Causality*
  • Confounding Factors, Epidemiologic
  • Data Interpretation, Statistical