Properties of R(2) statistics for logistic regression

Stat Med. 2006 Apr 30;25(8):1383-95. doi: 10.1002/sim.2300.


Various R(2) statistics have been proposed for logistic regression to quantify the extent to which the binary response can be predicted by a given logistic regression model and covariates. We study the asymptotic properties of three popular variance-based R(2) statistics. We find that two variance-based R(2) statistics, the sum of squares and the squared Pearson correlation, have identical asymptotic distribution whereas the third one, Gini's concentration measure, has a different asymptotic behaviour and may overstate the predictivity of the model and covariates when the model is mis-specified. Our result not only provides a theoretical basis for the findings in previous empirical and numerical work, but also leads to asymptotic confidence intervals. Statistical variability can then be taken into account when assessing the predictive value of a logistic regression model.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Biometry / methods*
  • Computer Simulation
  • Confidence Intervals
  • Epidemiologic Methods
  • Humans
  • Infant, Newborn
  • Infant, Premature
  • Infant, Very Low Birth Weight
  • Likelihood Functions*
  • Logistic Models*
  • Respiratory Distress Syndrome, Newborn / epidemiology
  • Risk Assessment / methods
  • Risk Factors
  • Statistical Distributions