A fundamental goal of scientific research is to generate true positives (i.e., authentic discoveries). Statistically, a true positive is a significant finding for which the underlying effect size (δ) is greater than 0, whereas a false positive is a significant finding for which δ equals 0. However, the null hypothesis of no difference (δ = 0) may never be strictly true because innumerable nuisance factors can introduce small effects for theoretically uninteresting reasons. If δ never equals zero, then with sufficient power, every experiment would yield a significant result. Yet running studies with higher power by increasing sample size (N) is one of the most widely agreed upon reforms to increase replicability. Moreover, and perhaps not surprisingly, the idea that psychology should attach greater value to small effect sizes is gaining currency. Increasing N without limit makes sense for purely measurement-focused research, where the magnitude of δ itself is of interest, but it makes less sense for theory-focused research, where the truth status of the theory under investigation is of interest. Increasing power to enhance replicability will increase true positives at the level of the effect size (statistical true positives) while increasing false positives at the level of theory (theoretical false positives). With too much power, the cumulative foundation of psychological science would consist largely of nuisance effects masquerading as theoretically important discoveries. Positive predictive value at the level of theory is maximized by using an optimal N, one that is neither too small nor too large.
Keywords: False positives; Null hypothesis significance testing; Positive predictive value; Replication crisis.
© 2022. The Psychonomic Society, Inc.