Meta-analyzing the multiverse: A peek under the hood of selective reporting

Anton Olsson-Collentine; Robbie C M van Aert; Marjan Bakker; Jelte Wicherts

doi:10.1037/met0000559

Meta-analyzing the multiverse: A peek under the hood of selective reporting

Psychol Methods. 2023 May 11. doi: 10.1037/met0000559. Online ahead of print.

Authors

Anton Olsson-Collentine¹, Robbie C M van Aert¹, Marjan Bakker¹, Jelte Wicherts¹

Affiliation

¹ Department of Methodology and Statistics, Tilburg School of Social and Behavioral Sciences, Tilburg University.

PMID: 37166859
DOI: 10.1037/met0000559

Abstract

Researcher degrees of freedom refer to arbitrary decisions in the execution and reporting of hypothesis-testing research that allow for many possible outcomes from a single study. Selective reporting of results (p-hacking) from this "multiverse" of outcomes can inflate effect size estimates and false positive rates. We studied the effects of researcher degrees of freedom and selective reporting using empirical data from extensive multistudy projects in psychology (Registered Replication Reports) featuring 211 samples and 14 dependent variables. We used a counterfactual design to examine what biases could have emerged if the studies (and ensuing meta-analyses) had not been preregistered and could have been subjected to selective reporting based on the significance of the outcomes in the primary studies. Our results show the substantial variability in effect sizes that researcher degrees of freedom can create in relatively standard psychological studies, and how selective reporting of outcomes can alter conclusions and introduce bias in meta-analysis. Despite the typically thousands of outcomes appearing in the multiverses of the 294 included studies, only in about 30% of studies did significant effect sizes in the hypothesized direction emerge. We also observed that the effect of a particular researcher degree of freedom was inconsistent across replication studies using the same protocol, meaning multiverse analyses often fail to replicate across samples. We recommend hypothesis-testing researchers to preregister their preferred analysis and openly report multiverse analysis. We propose a descriptive index (underlying multiverse variability) that quantifies the robustness of results across alternative ways to analyze the data. (PsycInfo Database Record (c) 2023 APA, all rights reserved).

Grants and funding

ERC_/European Research Council/International