No rationale for 1 variable per 10 events criterion for binary logistic regression analysis

BMC Med Res Methodol. 2016 Nov 24;16(1):163. doi: 10.1186/s12874-016-0267-3.

Abstract

Background: Ten events per variable (EPV) is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies.

Methods: The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth's correction, are compared.

Results: The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect ('separation'). We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth's correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation.

Conclusions: The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.

Keywords: Bias; EPV; Logistic regression; Sample size; Separation; Simulations.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bias*
  • Computer Simulation
  • Humans
  • Logistic Models*
  • Monte Carlo Method*
  • Reproducibility of Results
  • Sample Size*