Estimation of probabilities using the logistic model in retrospective studies

Comput Biomed Res. 1988 Oct;21(5):449-70. doi: 10.1016/0010-4809(88)90004-3.


Methods for estimating the parameters of the logistic regression model when the data are collected using a case-control (retrospective) scheme are compared. The regression coefficients are estimated by maximum likelihood methodology. This leaves the constant term parameter to be estimated. Four methods for estimating this parameter are proposed. The comparison of the four estimators is in two parts. First, they are compared for large samples. This is accomplished via the asymptotic distribution of the estimators. Second, the estimators are compared for small samples. This is conducted via stimulation using 11 logistic models. The estimation of the posterior probability of the response variable being a success (Px), as given by the logistic regression model, when the constant parameter is estimated by each of the four proposed methods is the main focus of this paper. A third concern is the comparison of the logistic discriminant procedures when each of the four methods of estimating the constant parameters is used. In addition, the linear discriminant function procedure is included. This comparison is executed only for small samples via simulation. It was found that when estimating Px, method 1 (which is essentially the MLE) minimizes the expected mean square error. The results were not as clear when the parameter of interest was the constant term itself. The results from the classification comparisons implied that when the logistic model contains mostly (or all) binary regression variables the logistic discriminant procedure using method 1 to estimate the constant term gives minimum expected error rate; otherwise the linear discriminant function gives minimum expected error rate. In the latter case the logistic discriminant procedure (method 1 estimator of the constant term) is approximately as good.

MeSH terms

  • Computer Simulation
  • Coronary Disease / epidemiology
  • Humans
  • Models, Theoretical*
  • Probability*
  • Regression Analysis
  • Retrospective Studies*
  • Risk Factors