Bias-reduced and separation-proof conditional logistic regression with small or sparse data sets

Stat Med. 2010 Mar 30;29(7-8):770-7. doi: 10.1002/sim.3794.


Conditional logistic regression is used for the analysis of binary outcomes when subjects are stratified into several subsets, e.g. matched pairs or blocks. Log odds ratio estimates are usually found by maximizing the conditional likelihood. This approach eliminates all strata-specific parameters by conditioning on the number of events within each stratum. However, in the analyses of both an animal experiment and a lung cancer case-control study, conditional maximum likelihood (CML) resulted in infinite odds ratio estimates and monotone likelihood. Estimation can be improved by using Cytel Inc.'s well-known LogXact software, which provides a median unbiased estimate and exact or mid-p confidence intervals. Here, we suggest and outline point and interval estimation based on maximization of a penalized conditional likelihood in the spirit of Firth's (Biometrika 1993; 80:27-38) bias correction method (CFL). We present comparative analyses of both studies, demonstrating some advantages of CFL over competitors. We report on a small-sample simulation study where CFL log odds ratio estimates were almost unbiased, whereas LogXact estimates showed some bias and CML estimates exhibited serious bias. Confidence intervals and tests based on the penalized conditional likelihood had close-to-nominal coverage rates and yielded highest power among all methods compared, respectively. Therefore, we propose CFL as an attractive solution to the stratified analysis of binary data, irrespective of the occurrence of monotone likelihood. A SAS program implementing CFL is available at:

MeSH terms

  • Aneurysm / epidemiology
  • Animals
  • Bias*
  • Biostatistics*
  • Breast Neoplasms / radiotherapy
  • Case-Control Studies
  • Computer Simulation / statistics & numerical data
  • Effect Modifier, Epidemiologic*
  • Female
  • Heparin / adverse effects
  • Heparin / therapeutic use
  • Humans
  • Likelihood Functions
  • Logistic Models*
  • Lung Neoplasms / epidemiology
  • Lung Neoplasms / etiology
  • Neoplasms, Radiation-Induced / epidemiology
  • Rats
  • Risk Factors
  • Smoking / adverse effects
  • Smoking / epidemiology
  • Software / statistics & numerical data
  • Transplantation, Heterologous / adverse effects
  • Transplantation, Heterologous / statistics & numerical data


  • Heparin