Background: Recently many long-term prospective studies have involved serial collection and storage of blood or tissue specimens. This has spurred nested case-control studies that involve testing some specimens for various markers that might predict cancer. Until now there has been little guidance in statistical design and analysis of these studies.
Methods: To develop statistical guidelines, we considered the purpose, the types of biases, and the opportunities for extracting additional information.
Results: The following guidelines: (1) For the clearest interpretation, statistics should be based on false and true positive rates - not odds ratios or relative risks (2) To avoid overdiagnosis bias, cases should be diagnosed as a result of symptoms rather than on screening. (3) To minimize selection bias, the spectrum of control conditions should be the same in study and target screening populations. (4) To extract additional information, criteria for a positive test should be based on combinations of individual markers and changes in marker levels over time. (5) To avoid overfitting, the criteria for a positive marker combination developed in a training sample should be evaluated in a random test sample from the same study and, if possible, a validation sample from another study. (6) To identify biomarkers with true and false positive rates similar to mammography, the training, test, and validation samples should each include at least 110 randomly selected subjects without cancer and 70 subjects with cancer.
Conclusion: These guidelines ensure good practice in the design and analysis of nested case-control studies of early detection biomarkers.