Background: High-throughput laboratory technologies coupled with sophisticated bioinformatics algorithms have tremendous potential for discovering novel biomarkers, or profiles of biomarkers, that could serve as predictors of disease risk, response to treatment or prognosis. We discuss methodological issues in wedding high-throughput approaches for biomarker discovery with the case-control study designs typically used in biomarker discovery studies, especially focusing on nested case-control designs.
Methods: We review principles for nested case-control study design in relation to biomarker discovery studies and describe how the efficiency of biomarker discovery can be effected by study design choices. We develop a simulated prostate cancer cohort data set and a series of biomarker discovery case-control studies nested within the cohort to illustrate how study design choices can influence biomarker discovery process.
Result: Common elements of nested case-control design, incidence density sampling and matching of controls to cases are not typically factored correctly into biomarker discovery analyses, inducing bias in the discovery process. We illustrate how incidence density sampling and matching of controls to cases reduce the apparent specificity of truly valid biomarkers 'discovered' in a nested case-control study. We also propose and demonstrate a new case-control matching protocol, we call 'antimatching', that improves the efficiency of biomarker discovery studies.
Conclusions: For a valid, but as yet undiscovered, biomarker(s) disjunctions between correctly designed epidemiologic studies and the practice of biomarker discovery reduce the likelihood that true biomarker(s) will be discovered and increases the false-positive discovery rate.
© 2012 The Authors. European Journal of Clinical Investigation © 2012 Stichting European Society for Clinical Investigation Journal Foundation.