Population-Calibrated Gene Characterization: Estimating Age at Onset Distributions Associated With Cancer Genes

J Am Stat Assoc. 2005;100(470):399-409. doi: 10.1198/016214505000000196.


Phenotypic characterization of rare disease genes poses a significant statistical challenge, but the need to do so is clear. Clinical management of patients carrying a disease gene depends crucially on an accurate characterization of the genetically predisposed disease, including its likelihood of occurrence among mutation carriers, natural history, and response to treatment. We propose a formal yet practical method for controlling for bias due to ignoring ascertainment, defined as the sampling mechanism, when quantifying the association between genotype and disease using data on high-risk families. The approach is more statistically efficient than conditioning on the variables used in sampling. In it, the likelihood is adjusted by a factor that is a function of sampling weights in strata defined by those variables. It requires that these variables and the sampling probabilities in the strata they define either are known or can be estimated. The latter requires a second, population-based dataset. As an example, we derive ascertainment-corrected estimates of penetrance for the breast cancer susceptibility genes BRCA1 and BRCA2. The Bayesian analysis that we use incorporates a modified segregation model and prior data on penetrance derived from the literature. Markov chain Monte Carlo methods are used for inference.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.