A permutation test for inference in logistic regression with small- and moderate-sized data sets

Stat Med. 2005 Mar 15;24(5):693-708. doi: 10.1002/sim.1931.


Inference based on large sample results can be highly inaccurate if applied to logistic regression with small data sets. Furthermore, maximum likelihood estimates for the regression parameters will on occasion not exist, and large sample results will be invalid. Exact conditional logistic regression is an alternative that can be used whether or not maximum likelihood estimates exist, but can be overly conservative. This approach also requires grouping the values of continuous variables corresponding to nuisance parameters, and inference can depend on how this is done. A simple permutation test of the hypothesis that a regression parameter is zero can overcome these limitations. The variable of interest is replaced by the residuals from a linear regression of it on all other independent variables. Logistic regressions are then done for permutations of these residuals, and a p-value is computed by comparing the resulting likelihood ratio statistics to the original observed value. Simulations of binary outcome data with two independent variables that have binary or lognormal distributions yield the following results: (a) in small data sets consisting of 20 observations, type I error is well-controlled by the permutation test, but poorly controlled by the asymptotic likelihood ratio test; (b) in large data sets consisting of 1000 observations, performance of the permutation test appears equivalent to that of the asymptotic test; and (c) in small data sets, the p-value for the permutation test is usually similar to the mid-p-value for exact conditional logistic regression.

Publication types

  • Comparative Study

MeSH terms

  • Clinical Trials as Topic / methods*
  • Computer Simulation
  • Data Interpretation, Statistical*
  • Humans
  • Likelihood Functions*
  • Logistic Models*
  • Sample Size
  • Urinary Incontinence / drug therapy