The implications of Simpson's paradox for cross-scale inference among lakes

Water Res. 2019 Oct 15:163:114855. doi: 10.1016/j.watres.2019.114855. Epub 2019 Jul 13.


Using cross-sectional data for making ecological inference started as a practical means of pooling data to enable meaningful empirical model development. For example, limnologists routinely use sample averages from numerous individual lakes to examine patterns across lakes. The basic assumption behind the use of cross-lake data is often that responses within and across lakes are identical. As data from multiple study units across a wide spatiotemporal scale are increasingly accessible for researchers, an assessment of this assumption is now feasible. In this study, we demonstrate that this assumption is usually unjustified, due largely to a statistical phenomenon known as the Simpson's paradox. Through comparisons of a commonly used empirical model of the effect of nutrients on algal growth developed using several data sets, we discuss the cognitive importance of distinguishing factors affecting lake eutrophication operating at different spatial and temporal scales. Our study proposes the use of the Bayesian hierarchical modeling approach to properly structure the data analysis when data from multiple lakes are employed.

Keywords: Chlorophyll a; LAGOSSE; Multilevel/hierarchical model; NLA.

MeSH terms

  • Bayes Theorem
  • Cross-Sectional Studies
  • Environmental Monitoring*
  • Eutrophication
  • Lakes*