Improving External Validity of Epidemiologic Cohort Analyses: A Kernel Weighting Approach

J R Stat Soc Ser A Stat Soc. 2020 Jun;183(3):1293-1311. doi: 10.1111/rssa.12564. Epub 2020 Apr 25.


For various reasons, cohort studies generally forgo probability sampling required to obtain population representative samples. However, such cohorts lack population-representativeness, which invalidates estimates of population prevalences for novel health factors only available in cohorts. To improve external validity of estimates from cohorts, we propose a kernel weighting (KW) approach that uses survey data as a reference to create pseudo-weights for cohorts. A jackknife variance is proposed for the KW estimates. In simulations, the KW method outperformed two existing propensity-score-based weighting methods in mean-squared error while maintaining confidence interval coverage. We applied all methods to estimating US population mortality and prevalences of various diseases from the non-representative US NIH-AARP cohort, using the sample from US-representative National Health Interview Survey (NHIS) as the reference. Assuming that the NHIS estimates are correct, the KW approach yielded generally less biased estimates compared to the existing propensity-score-based weighting methods.

Keywords: Cohort studies; Jackknife variance estimation; Taylor series linearization variance; complex survey sample; kernel smoothing; propensity score weighting.