Problems in the Definition, Interpretation, and Evaluation of Genetic Heterogeneity

Am J Hum Genet. 2001 Feb;68(2):457-65. doi: 10.1086/318186. Epub 2001 Jan 19.


Suppose that we wish to classify families with multiple cases of disease into one of three categories: those that segregate mutations of a gene of interest, those which segregate mutations of other genes, and those whose disease is due to nonhereditary factors or chance. Among families in the first two categories (the hereditary families), we wish to estimate the proportion, p, of families that segregate mutations of the gene of interest. Although this proportion is a commonly accepted concept, it is well defined only with an unambiguous definition of "family." Even then, extraneous factors such as family sizes and structures can cause p to vary across different populations and, within a population, to be estimated differently by different studies. Restrictive assumptions about the disease are needed, in order to avoid this undesirable variation. The assumptions require that mutations of all disease-causing genes (i) have no effect on family size, (ii) have very low frequencies, and (iii) have penetrances that satisfy certain constraints. Despite the unverifiability of these assumptions, linkage studies often invoke them to estimate p, using the admixture likelihood introduced by Smith and discussed by Ott. We argue against this common practice, because (1) it also requires the stronger assumption of equal penetrances for all etiologically relevant genes; (2) even if all assumptions are met, estimates of p are sensitive to misspecification of the unknown phenocopy rate; (3) even if all the necessary assumptions are met and the phenocopy rate is correctly specified, estimates of p that are obtained by linkage programs such as HOMOG and GENEHUNTER are based on the wrong likelihood and therefore are biased in the presence of phenocopies. We show how to correct these estimates; but, nevertheless, we do not recommend the use of parametric heterogeneity models in linkage analysis, even merely as a tool for increasing the statistical power to detect linkage. This is because the assumptions required by these models cannot be verified, and their violation could actually decrease power. Instead, we suggest that estimation of p be postponed until the relevant genes have been identified. Then their frequencies and penetrances can be estimated on the basis of population-based samples and can be used to obtain more-robust estimates of p for specific populations.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Family Health
  • Female
  • Genetic Heterogeneity*
  • Genetic Linkage
  • Genetic Predisposition to Disease / genetics*
  • Humans
  • Male
  • Models, Genetic
  • Mutation
  • Pedigree
  • Penetrance
  • Phenotype