To access the calibration of a predictive model in a survival analysis setting, several authors have extended the Hosmer-Lemeshow goodness-of-fit test to survival data. Grønnesby and Borgan developed a test under the proportional hazards assumption, and Nam and D'Agostino developed a nonparametric test that is applicable in a more general survival setting for data with limited censoring. We analyze the performance of the two tests and show that the Grønnesby-Borgan test attains appropriate size in a variety of settings, whereas the Nam-D'Agostino method has a higher than nominal Type 1 error when there is more than trivial censoring. Both tests are sensitive to small cell sizes. We develop a modification of the Nam-D'Agostino test to allow for higher censoring rates. We show that this modified Nam-D'Agostino test has appropriate control of Type 1 error and comparable power to the Grønnesby-Borgan test and is applicable to settings other than proportional hazards. We also discuss the application to small cell sizes.
Keywords: calibration; goodness-of-fit; survival analysis.
Copyright © 2015 John Wiley & Sons, Ltd.