To develop personalized screening and surveillance strategies, the information required to superimpose state-specific covariates into the multi-step progression of disease natural history often relies on the entire population-based screening data, which are costly and infeasible particularly when a new biomarker is proposed. Following Prentice's case-cohort concept, a non-standard case-cohort design from a previous study has been adapted for constructing multistate disease natural history with two-stage sampling. Nonetheless, the use of data only from first screens may invoke length-bias and fail to consider the test sensitivity. Therefore, a new sampling-based Markov regression model and its variants are proposed to accommodate additional subsequent follow-up data on various detection modes to construct state-specific covariate-based multistate disease natural history with accuracy and efficiency. Computer simulation algorithms for determining the required sample size and the sampling fraction of each detection mode were developed either through power function or the capacity of screening program. The former is illustrated with breast cancer screening data from which the effect size and the required sample size regarding the effect of BRCA on multistate outcome of breast cancer were estimated. The latter is applied to population-based colorectal cancer screening data to identify the optimal sampling fraction of each detection mode.
Keywords: Markov exponential regression model; Two-stage sampling design; multistate model.