We describe a flexible family of tests for evaluating the goodness of fit (calibration) of a pre-specified personal risk model to the outcomes observed in a longitudinal cohort. Such evaluation involves using the risk model to assign each subject an absolute risk of developing the outcome within a given time from cohort entry and comparing subjects' assigned risks with their observed outcomes. This comparison involves several issues. For example, subjects followed only for part of the risk period have unknown outcomes. Moreover, existing tests do not reveal the reasons for poor model fit when it occurs, which can reflect misspecification of the model's hazards for the competing risks of outcome development and death. To address these issues, we extend the model-specified hazards for outcome and death, and use score statistics to test the null hypothesis that the extensions are unnecessary. Simulated cohort data applied to risk models whose outcome and mortality hazards agreed and disagreed with those generating the data show that the tests are sensitive to poor model fit, provide insight into the reasons for poor fit, and accommodate a wide range of model misspecification. We illustrate the methods by examining the calibration of two breast cancer risk models as applied to a cohort of participants in the Breast Cancer Family Registry. The methods can be implemented using the Risk Model Assessment Program, an R package freely available at http://stanford.edu/~ggong/rmap/.
Keywords: absolute risk; cohort data; efficient score statistics; goodness of fit; personal disease risk; standardized residuals.
Copyright © 2014 John Wiley & Sons, Ltd.