Tests of calibration and goodness-of-fit in the survival setting

Stat Med. 2015 May 10;34(10):1659-80. doi: 10.1002/sim.6428. Epub 2015 Feb 11.


To access the calibration of a predictive model in a survival analysis setting, several authors have extended the Hosmer-Lemeshow goodness-of-fit test to survival data. Grønnesby and Borgan developed a test under the proportional hazards assumption, and Nam and D'Agostino developed a nonparametric test that is applicable in a more general survival setting for data with limited censoring. We analyze the performance of the two tests and show that the Grønnesby-Borgan test attains appropriate size in a variety of settings, whereas the Nam-D'Agostino method has a higher than nominal Type 1 error when there is more than trivial censoring. Both tests are sensitive to small cell sizes. We develop a modification of the Nam-D'Agostino test to allow for higher censoring rates. We show that this modified Nam-D'Agostino test has appropriate control of Type 1 error and comparable power to the Grønnesby-Borgan test and is applicable to settings other than proportional hazards. We also discuss the application to small cell sizes.

Keywords: calibration; goodness-of-fit; survival analysis.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Adult
  • Aged
  • Bias
  • Calibration
  • Computer Simulation
  • Coronary Disease / epidemiology
  • Coronary Disease / etiology
  • Female
  • Humans
  • Longitudinal Studies
  • Middle Aged
  • Models, Theoretical*
  • Proportional Hazards Models
  • Risk Assessment / methods
  • Sample Size
  • Statistics, Nonparametric
  • Survival Analysis*