Efficient Estimation of Semiparametric Transformation Models for Two-Phase Cohort Studies

J Am Stat Assoc. 2014 Jan 1;109(505):371-383. doi: 10.1080/01621459.2013.842172.


Under two-phase cohort designs, such as case-cohort and nested case-control sampling, information on observed event times, event indicators, and inexpensive covariates is collected in the first phase, and the first-phase information is used to select subjects for measurements of expensive covariates in the second phase; inexpensive covariates are also used in the data analysis to control for confounding and to evaluate interactions. This paper provides efficient estimation of semiparametric transformation models for such designs, accommodating both discrete and continuous covariates and allowing inexpensive and expensive covariates to be correlated. The estimation is based on the maximization of a modified nonparametric likelihood function through a generalization of the expectation-maximization algorithm. The resulting estimators are shown to be consistent, asymptotically normal and asymptotically efficient with easily estimated variances. Simulation studies demonstrate that the asymptotic approximations are accurate in practical situations. Empirical data from Wilms' tumor studies and the Atherosclerosis Risk in Communities (ARIC) study are presented.

Keywords: Case-cohort design; EM algorithm; Kernel estimation; Nested case-control sampling; Nonparametric likelihood; Semiparametric efficiency.