Statistical significance: p value, 0.05 threshold, and applications to radiomics-reasons for a conservative approach

Giovanni Di Leo; Francesco Sardanelli

doi:10.1186/s41747-020-0145-y

Statistical significance: p value, 0.05 threshold, and applications to radiomics-reasons for a conservative approach

Eur Radiol Exp. 2020 Mar 11;4(1):18. doi: 10.1186/s41747-020-0145-y.

Authors

Giovanni Di Leo¹, Francesco Sardanelli^{2

3}

Affiliations

¹ Radiology Unit, IRCCS Policlinico San Donato, Via Morandi 30, 20097, San Donato Milanese, Italy. gianni.dileo77@gmail.com.
² Radiology Unit, IRCCS Policlinico San Donato, Via Morandi 30, 20097, San Donato Milanese, Italy.
³ Dipartimento di Scienze Biomediche per la Salute, Università degli Studi di Milano, Via Morandi 30, 20097, San Donato Milanese, Italy.

Abstract

Here, we summarise the unresolved debate about p value and its dichotomisation. We present the statement of the American Statistical Association against the misuse of statistical significance as well as the proposals to abandon the use of p value and to reduce the significance threshold from 0.05 to 0.005. We highlight reasons for a conservative approach, as clinical research needs dichotomic answers to guide decision-making, in particular in the case of diagnostic imaging and interventional radiology. With a reduced p value threshold, the cost of research could increase while spontaneous research could be reduced. Secondary evidence from systematic reviews/meta-analyses, data sharing, and cost-effective analyses are better ways to mitigate the false discovery rate and lack of reproducibility associated with the use of the 0.05 threshold. Importantly, when reporting p values, authors should always provide the actual value, not only statements of "p < 0.05" or "p ≥ 0.05", because p values give a measure of the degree of data compatibility with the null hypothesis. Notably, radiomics and big data, fuelled by the application of artificial intelligence, involve hundreds/thousands of tested features similarly to other "omics" such as genomics, where a reduction in the significance threshold, based on well-known corrections for multiple testing, has been already adopted.

Keywords: Confidence intervals; Decision making; Models (statistical); Radiomics; Reproducibility of results.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Artificial Intelligence
Big Data
Data Interpretation, Statistical*
Diagnostic Imaging*
Humans
Radiology, Interventional*