In cohort studies, it can be infeasible to collect specimens on an entire cohort. For example, to estimate sensitivity of multiple multi-cancer detection (MCD) assays, we desire an extra 80 mL of cell-free DNA (cfDNA) blood, but this much extra blood is too expensive for us to collect on everyone. We propose a novel epidemiologic study design that efficiently oversamples those at highest baseline disease risk from whom to collect specimens, to increase the number of future cases with cfDNA blood collection. The variance reduction ratio from our risk-based subsample versus a simple random (sub)sample (SRS) depends primarily on the ratio of risk model sensitivity to the fraction of the cohort selected for specimen collection subject to constraining the risk model specificity. In a simulation where we chose 34% of the Prostate, Lung, Colorectal, and Ovarian Screening Trial cohort at highest risk of lung cancer for cfDNA blood collection, we could enrich the number of lung cancers 2.42-fold. The standard deviation of lung-cancer MCD sensitivity was 31%-33% reduced versus SRS. Risk-based collection of specimens on a subsample of the cohort could be a feasible and efficient approach to collecting extra specimens for molecular epidemiology.
Keywords: case-cohort; diagnostic testing; epidemiologic sampling design; nested case–control.
Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health 2024.