Two-stage sampling designs for external validation of personal risk models

Stat Methods Med Res. 2016 Aug;25(4):1313-29. doi: 10.1177/0962280213480420. Epub 2013 Apr 16.


We propose a cost-effective sampling design and estimating procedure for validating personal risk models using right-censored cohort data. Validation involves using each subject's covariates, as ascertained at cohort entry, in a risk model (specified independently of the data) to assign him/her a probability of an adverse outcome within a future time period. Subjects are then grouped according to the magnitudes of their assigned risks, and within each group, the mean assigned risk is compared with the probability of outcome occurrence as estimated using the follow-up data. Such validation presents two complications. First, in the presence of right-censoring, estimating the probability of developing the outcomes before death requires competing risk analysis. Second, for rare outcomes, validation using the full cohort requires assembling covariates and assigning risks to thousands of subjects. This can be costly when some covariates involve analyzing biological specimens. A two-stage sampling design addresses this problem by assembling covariates and assigning risks only to those subjects most informative for estimating key parameters. We use this design to estimate the outcome probabilities needed to evaluate model performance and we provide theoretical and bootstrap estimates of their variances. We also describe how to choose two-stage designs with minimal efficiency loss for a parameter of interest when the quantities determining optimality are unknown at the time of design. We illustrate these methods by using subjects in the California Teachers Study to validate ovarian cancer risk models. We find that a design with optimal efficiency for one performance parameter need not be so for others, and trade-offs will be required. A two-stage design that samples all outcome-positive subjects and more outcome-negative than censored subjects will perform well in most circumstances. The methods are implemented in Risk Model Assessment Program, an R program freely available at

Keywords: Bootstrap; calibration; competing risks; discrimination; personal risk models; two-stage sampling.

MeSH terms

  • California
  • Female
  • Genetic Predisposition to Disease
  • Humans
  • Ovarian Neoplasms* / genetics
  • Probability
  • Risk Assessment / methods*
  • Sensitivity and Specificity