Criteria for the validation of surrogate endpoints in randomized experiments

Biometrics. 1998 Sep;54(3):1014-29.


The validation of surrogate endpoints has been studied by Prentice (1989, Statistics in Medicine 8, 431-440) and Freedman, Graubard, and Schatzkin (1992, Statistics in Medicine 11, 167-178). We extended their proposals in the cases where the surrogate and the final endpoints are both binary or normally distributed. Letting T and S be random variables that denote the true and surrogate endpoint, respectively, and Z be an indicator variable for treatment, Prentice's criteria are fulfilled if Z has a significant effect on T and on S, if S has a significant effect on T, and if Z has no effect on T given S. Freedman relaxed the latter criterion by estimating PE, the proportion of the effect of Z on T that is explained by S, and by requiring that the lower confidence limit of PE be larger than some proportion, say 0.5 or 0.75. This condition can only be verified if the treatment has a massively significant effect on the true endpoint, a rare situation. We argue that two other quantities must be considered in the validation of a surrogate endpoint: RE, the effect of Z on T relative to that of Z on S, and gamma Z, the association between S and T after adjustment for Z. A surrogate is said to be perfect at the individual level when there is a perfect association between the surrogate and the final endpoint after adjustment for treatment. A surrogate is said to be perfect at the population level if RE is 1. A perfect surrogate fulfills both conditions, in which case S and T are identical up to a deterministic transformation. Fieller's theorem is used for the estimation of PE, RE, and their respective confidence intervals. Logistic regression models and the global odds ratio model studied by Dale (1986, Biometrics, 42, 909-917) are used for binary endpoints. Linear models are employed for continuous endpoints. In order to be of practical value, the validation of surrogate endpoints is shown to require large numbers of observations.

MeSH terms

  • Biometry / methods*
  • Humans
  • Models, Statistical
  • Randomized Controlled Trials as Topic / methods*
  • Reproducibility of Results
  • Treatment Outcome