Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Nov 10;14(11):e2000995.
doi: 10.1371/journal.pbio.2000995. eCollection 2016 Nov.

Current Incentives for Scientists Lead to Underpowered Studies With Erroneous Conclusions

Affiliations
Free PMC article

Current Incentives for Scientists Lead to Underpowered Studies With Erroneous Conclusions

Andrew D Higginson et al. PLoS Biol. .
Free PMC article

Abstract

We can regard the wider incentive structures that operate across science, such as the priority given to novel findings, as an ecosystem within which scientists strive to maximise their fitness (i.e., publication record and career success). Here, we develop an optimality model that predicts the most rational research strategy, in terms of the proportion of research effort spent on seeking novel results rather than on confirmatory studies, and the amount of research effort per exploratory study. We show that, for parameter values derived from the scientific literature, researchers acting to maximise their fitness should spend most of their effort seeking novel results and conduct small studies that have only 10%-40% statistical power. As a result, half of the studies they publish will report erroneous conclusions. Current incentive structures are in conflict with maximising the scientific value of research; we suggest ways that the scientific ecosystem could be improved.

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Fitness landscape for an individual researcher.
An individual researcher is able to choose the parameters θ (y-axis) and SE; the x-axis shows the resultant power of exploratory studies, WE. White indicates high fitness, black low fitness. For small values of SE, few papers are accepted, while for high values of SE, few studies are carried out. For low values of θ, few novel studies are carried out. (A) γ = 0.09, ϕ = 0.9. The optimal strategy that maximises individual fitness is therefore to carry out many small exploratory studies with a power of around 15%. (B) γ = 0.055, ϕ = 0.55. A mixture of exploratory and confirmatory work should be carried out with slightly higher power (20%).
Fig 2
Fig 2. Effect of varying the weighting given to published exploratory studies (γ).
Parameter γ reflects the relative importance of published exploratory studies. The lines show predictions for two values of the probability that an effect is real (fE) and two values of the effect sizes rC and rE (solid: fE = 0.2, rC = rE = 0.21; dotted: fE = 0.3, rC = rE = 0.21; dashed: fE = 0.2, rC = rE = 0.32). The panels show (A) the optimal proportion of total sampling to spend on exploratory studies θ*, (B) the optimal sample size of exploratory studies SE*, (C) the resultant total number of published studies NE + NC, (D) the proportion of published studies that are confirmatory NC / (NE + NC), (E) the statistical power of exploratory studies WE, and (F) the proportion of published studies that draw incorrect conclusions (PF). Other values: SC = 120, T = 2,000, k = 20, α = 0.05, σ2 = 1, m = 3, and ϕ = 0.8. The chosen values for rC = rE reflect data reported by Richard and colleagues [14], where a correlation coefficient mode of 0.09 and a mean of 0.21 were observed. These values are in the middle of the range of effect sizes observed in meta-analyses across a number of biomedical research domains (range r ~ 0.15 to 0.50) [13].
Fig 3
Fig 3. Effect of γ and ϕ on a hypothetical measure of the total scientific value of research (VS).
The figure shows the product of the number of published confirmatory studies, the number of published exploratory studies, and the proportion of published studies that are correct (red = high, blue = low). This measure is calculated for when all researchers are following the rational strategy given the values of γ and ϕ. The current emphasis on a small number of publications that report novel findings is characterised by high γ and high ϕ (top right). To improve scientific output according to this measure, we could reduce ϕ (i.e., make more published studies count for researchers’ careers) or reduce γ (i.e., reduce weighting of published exploratory studies). Interestingly, the ridge is flat, so any point along it has equal fitness. Therefore, a pragmatic compromise would be to reduce both γ and ϕ by a lesser amount. The panels show the VT for two values of the dependence of acceptance on sample size m and the Type I error rate α: (A) α = 0.05, m = 3, colour range: 2.0–3.18; (B) α = 0.05, m = 6, colour range: 2.0–2.82; (C) α = 0.03, m = 6, colour range: 2.0–2.065. Other values: SC = 120, T = 2000, k = 20, fE = 0.2, rC = rE = 0.21, and σ2 = 1.
Fig 4
Fig 4. Effect of editorial stringency on total scientific output for current incentive structures.
The figure shows the proportion of published findings that are correct 1-PF (A, B), the total number of published studies NC + NC (C, D), and the total scientific value of research VT (E, F). We varied the following parameters: the probability of a Type I error α (A, C, E), and the dependence of acceptance on sample size m (B, D, F). The lines show predictions for two values of the probability that an effect is real (fE) and two values of the effect size (solid: fE = 0.2, rC = rE = 0.21; dotted: fE = 0.3, rC = rE = 0.21; and dashed: fE = 0.2, rC = rE = 0.32). Other values: NC = 120, T = 2,000, k = 20, α = 0.05, σ2 = 1, m = 3, ϕ = 0.9, and γ = 0.09. The steps occur where there is discontinuity in the effect of α on SE*.
Fig 5
Fig 5. Effect of editorial stringency on total scientific output for ideal incentive structures.
The figure shows the proportion of published studies that are correct 1 − PT (A, B), the total number of published studies NC + NC (C, D), and the total scientific value of research VS (E, F). We varied the following parameters: the probability of a Type I error α (A, C, E), and the dependence of acceptance on sample size m (B, D, F). The lines show predictions for two values of the probability that an effect is real (fE) and two values of the effect size (solid: fE = 0.2, rC = rE = 0.21; dotted: fE = 0.3, rC = rE = 0.21; dashed: fE = 0.2, rC = rE = 0.32). Other values: NC = 120, T = 2,000, k = 20, α = 0.05, σ2 = 1, m = 3, ϕ = 0.55, and γ = 0.055.

Similar articles

See all similar articles

Cited by 28 articles

See all "Cited by" articles

References

    1. van Dijk D, Manor O, Carey LB (2014) Publication metrics and success on the academic job market. Curr Biol 24: R516–517. 10.1016/j.cub.2014.04.039 - DOI - PubMed
    1. Button KS, Ioannidis JP, Mokrysz C, Nosek BA, Flint J, et al. (2013) Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci 14: 365–376. 10.1038/nrn3475 - DOI - PubMed
    1. Szucs D Ioannidis JPA (2016). Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature. bioRxiv 071530. - PMC - PubMed
    1. Open Science Collaboration (2015) Estimating the reproducibility of psychological science. Science 349: aac4716 10.1126/science.aac4716 - DOI - PubMed
    1. Ioannidis JP (2005) Why most published research findings are false. PLoS Med 2: e124 10.1371/journal.pmed.0020124 - DOI - PMC - PubMed

Grant support

Medical Research Council and the University of Bristol (grant number MC_UU_12013/6).Received by MRM. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Natural Environment Research Council (grant number NE/L011921/1).Received by ADH. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. MRM is a member of the UK Centre for Tobacco and Alcohol Studies, a UKCRC Public Health Research: Centre of Excellence. Funding from British Heart Foundation, Cancer Research UK, Economic and Social Research Council, Medical Research Council, and the National Institute for Health Research, under the auspices of the UK Clinical Research Collaboration, is gratefully acknowledged
Feedback