Estimating predicted probabilities from logistic regression: different methods correspond to different target populations

Int J Epidemiol. 2014 Jun;43(3):962-70. doi: 10.1093/ije/dyu029. Epub 2014 Mar 5.


Background: We review three common methods to estimate predicted probabilities following confounder-adjusted logistic regression: marginal standardization (predicted probabilities summed to a weighted average reflecting the confounder distribution in the target population); prediction at the modes (conditional predicted probabilities calculated by setting each confounder to its modal value); and prediction at the means (predicted probabilities calculated by setting each confounder to its mean value). That each method corresponds to a different target population is underappreciated in practice. Specifically, prediction at the means is often incorrectly interpreted as estimating average probabilities for the overall study population, and furthermore yields nonsensical estimates in the presence of dichotomous confounders. Default commands in popular statistical software packages often lead to inadvertent misapplication of prediction at the means.

Methods: Using an applied example, we demonstrate discrepancies in predicted probabilities across these methods, discuss implications for interpretation and provide syntax for SAS and Stata.

Results: Marginal standardization allows inference to the total population from which data are drawn. Prediction at the modes or means allows inference only to the relevant stratum of observations. With dichotomous confounders, prediction at the means corresponds to a stratum that does not include any real-life observations.

Conclusions: Marginal standardization is the appropriate method when making inference to the overall population. Other methods should be used with caution, and prediction at the means should not be used with binary confounders. Stata, but not SAS, incorporates simple methods for marginal standardization.

Keywords: Bias; logistic regression; predicted probabilities; risk; standardization; target population.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Confounding Factors, Epidemiologic*
  • Epidemiologic Research Design*
  • Logistic Models
  • Models, Statistical
  • Probability*