Two-sample t α -test for testing hypotheses in small-sample experiments

Int J Biostat. 2022 Jun 24;19(1):1-19. doi: 10.1515/ijb-2021-0047. eCollection 2023 May 1.

Abstract

It has been reported that about half of biological discoveries are irreproducible. These irreproducible discoveries were partially attributed to poor statistical power. The poor powers are majorly owned to small sample sizes. However, in molecular biology and medicine, due to the limit of biological resources and budget, most molecular biological experiments have been conducted with small samples. Two-sample t-test controls bias by using a degree of freedom. However, this also implicates that t-test has low power in small samples. A discovery found with low statistical power suggests that it has a poor reproducibility. So, promotion of statistical power is not a feasible way to enhance reproducibility in small-sample experiments. An alternative way is to reduce type I error rate. For doing so, a so-called t α -test was developed. Both theoretical analysis and simulation study demonstrate that t α -test much outperforms t-test. However, t α -test is reduced to t-test when sample sizes are over 15. Large-scale simulation studies and real experiment data show that t α -test significantly reduced type I error rate compared to t-test and Wilcoxon test in small-sample experiments. t α -test had almost the same empirical power with t-test. Null p-value density distribution explains why t α -test had so lower type I error rate than t-test. One real experimental dataset provides a typical example to show that t α -test outperforms t-test and a microarray dataset showed that t α -test had the best performance among five statistical methods. In addition, the density distribution and probability cumulative function of t α -statistic were given in mathematics and the theoretical and observed distributions are well matched.

Keywords: cumulative distribution function; hypothesis test; power; small samples; type I error; type II error.

MeSH terms

  • Computer Simulation
  • Likelihood Functions
  • Models, Statistical*
  • Reproducibility of Results
  • Sample Size