Thou Shalt Not Bear False Witness Against Null Hypothesis Significance Testing

Miguel A García-Pérez

doi:10.1177/0013164416668232

Thou Shalt Not Bear False Witness Against Null Hypothesis Significance Testing

Educ Psychol Meas. 2017 Aug;77(4):631-662. doi: 10.1177/0013164416668232. Epub 2016 Oct 5.

Author

Miguel A García-Pérez¹

Affiliation

¹ Universidad Complutense, Madrid, Spain.

Abstract

Null hypothesis significance testing (NHST) has been the subject of debate for decades and alternative approaches to data analysis have been proposed. This article addresses this debate from the perspective of scientific inquiry and inference. Inference is an inverse problem and application of statistical methods cannot reveal whether effects exist or whether they are empirically meaningful. Hence, raising conclusions from the outcomes of statistical analyses is subject to limitations. NHST has been criticized for its misuse and the misconstruction of its outcomes, also stressing its inability to meet expectations that it was never designed to fulfil. Ironically, alternatives to NHST are identical in these respects, something that has been overlooked in their presentation. Three of those alternatives are discussed here (estimation via confidence intervals and effect sizes, quantification of evidence via Bayes factors, and mere reporting of descriptive statistics). None of them offers a solution to the problems that NHST is purported to have, all of them are susceptible to misuse and misinterpretation, and some bring around their own problems (e.g., Bayes factors have a one-to-one correspondence with p values, but they are entirely deprived of an inferential framework). Those alternatives also fail to cover a broad area of inference not involving distributional parameters, where NHST procedures remain the only (and suitable) option. Like knives or axes, NHST is not inherently evil; only misuse and misinterpretation of its outcomes needs to be eradicated.

Keywords: Bayes factor; estimation; goodness of fit; inverse problem; significance testing.