Estimating incident population distribution from prevalent data

Biometrics. 2012 Jun;68(2):521-31. doi: 10.1111/j.1541-0420.2011.01708.x. Epub 2012 Feb 7.


A prevalent sample consists of individuals who have experienced disease incidence but not failure event at the sampling time. We discuss methods for estimating the distribution function of a random vector defined at baseline for an incident disease population when data are collected by prevalent sampling. Prevalent sampling design is often more focused and economical than incident study design for studying the survival distribution of a diseased population, but prevalent samples are biased by design. Subjects with longer survival time are more likely to be included in a prevalent cohort, and other baseline variables of interests that are correlated with survival time are also subject to sampling bias induced by the prevalent sampling scheme. Without recognition of the bias, applying empirical distribution function to estimate the population distribution of baseline variables can lead to serious bias. In this article, nonparametric and semiparametric methods are developed for distribution estimation of baseline variables using prevalent data.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Bias
  • Biometry / methods*
  • Cohort Studies
  • Computer Simulation
  • Cross-Sectional Studies
  • Data Interpretation, Statistical
  • Demography / statistics & numerical data*
  • Female
  • Humans
  • Incidence
  • Likelihood Functions
  • Models, Statistical
  • Monte Carlo Method
  • Ovarian Neoplasms / epidemiology
  • Ovarian Neoplasms / mortality
  • Ovarian Neoplasms / pathology
  • Proportional Hazards Models
  • Regression Analysis
  • Sampling Studies
  • Statistics, Nonparametric