Weighted likelihood, pseudo-likelihood and maximum likelihood methods for logistic regression analysis of two-stage data

Stat Med. 1997 Jan 15-Feb 15;16(1-3):103-16. doi: 10.1002/(sici)1097-0258(19970115)16:1<103::aid-sim474>3.0.co;2-p.


General approaches to the fitting of binary response models to data collected in two-stage and other stratified sampling designs include weighted likelihood, pseudo-likelihood and full maximum likelihood. In previous work the authors developed the large sample theory and methodology for fitting of logistic regression models to two-stage case-control data using full maximum likelihood. The present paper describes computational algorithms that permit efficient estimation of regression coefficients using weighted, pseudo- and full maximum likelihood. It also presents results of a simulation study involving continuous covariables where maximum likelihood clearly outperformed the other two methods and discusses the analysis of data from three bona fide case-control studies that illustrate some important relationships among the three methods. A concluding section discusses the application of two-stage methods to case-control studies with validation subsampling for control of measurement error.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Adult
  • Age Distribution
  • Aged
  • Case-Control Studies
  • Data Collection
  • Data Interpretation, Statistical
  • Female
  • Humans
  • Infant
  • Infant Mortality
  • Infant, Newborn
  • Likelihood Functions*
  • Logistic Models*
  • Lung Neoplasms / etiology
  • Male
  • Middle Aged
  • Occupational Diseases / etiology
  • Risk Factors
  • Sampling Studies
  • Sex Distribution
  • Smoking / adverse effects
  • Survival Analysis