Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Feb;74(2):306-16.
doi: 10.1086/381714. Epub 2004 Jan 19.

Inherent Bias Toward the Null Hypothesis in Conventional Multipoint Nonparametric Linkage Analysis

Affiliations
Free PMC article

Inherent Bias Toward the Null Hypothesis in Conventional Multipoint Nonparametric Linkage Analysis

Nicholas J Schork et al. Am J Hum Genet. .
Free PMC article

Abstract

Traditional nonparametric "multipoint" statistical procedures have been developed for assigning allele-sharing values at a locus of interest to pairs of relatives for linkage studies. These procedures attempt to accommodate a lack of informativity, nongenotyped loci, missing data, and related issues concerning the genetic markers used in a linkage study. However, such procedures often cannot overcome these phenomena in compelling ways and, as a result, assign relevant relative pairs allele-sharing values that are "expected" for those pairs. The practice of assigning expected allele-sharing values to relative pairs in the face of a lack of explicit allele-transmission information can bias traditional nonparametric linkage test statistics toward the null hypothesis of no locus effect. This bias is due to the use of expected values, rather than to a lack of information about actual allele sharing at relevant marker loci. The bias will vary from study to study on the basis of the DNA markers, sample size, relative-pair types, and pedigree structures used, but it can be extremely pronounced and could contribute to a lack of consistent success in the application of traditional nonparametric linkage analyses to complex human traits and diseases. There are several potential ways to overcome this problem, but their foundations deserve greater research. We expose many of the issues concerning allele sharing with data from a large affected-sibling-pair study investigating the genetic basis of autism.

Figures

Figure  1
Figure 1
A mating between individuals heterozygous for the same alleles at a single locus that produces two heterozygous offspring (A). In this situation, the fraction of alleles shared between the siblings is not known with certainty: the siblings either share 0 alleles with 0.5 probability (B) or 2 alleles with 0.5 probability (C). However, the conventional (and correct) estimate of the fraction of alleles shared by this sibling pair would be formula image
Figure  2
Figure 2
Example of output from MERLIN (A) and ASPEX sib_ibd (B) providing probabilities that a set of sibling pairs with autism share 0, 1, or 2 alleles IBD at a particular locus. Note that since the locus chosen was a locus at which marker data were available, results will therefore be more informative for allele sharing than a locus for which marker locus information is not available. Note also that, for the third sibling pair (family 17, sibling pair identifiers 3 and 4), MERLIN and ASPEX give discrepant probabilities, despite the use of the same marker loci.
Figure  3
Figure 3
Scatter plots of the relationships and agreement between formula image estimates for 122 sibling pairs with autism, generated by use of MERLIN and either ASPEX sib_ibd (A) or sib_phase (B) options at a particular locus on chromosome 2.
Figure  4
Figure 4
The frequency distributions of formula image values (A) and the variance in formula image (B) for 122 affected sibling at a particular locus on chromosome 2. Varying degrees of ambiguity in allele-sharing assignments are reflected both in sibling pairs with formula image values ≠0, 0.5, or 1 (A) and in sibling pairs with a variance in formula image (B).
Figure  5
Figure 5
The effect of uninformative sibling pairs on the power of affected sibling pair LOD score statistics assessing linkage. The X-axis simply reflects the number of sibling pairs assigned a formula image value of 0.5 because of a lack of informativity. A, Each curve represents a different locus effect (in terms of risk of disease) on the basis of the hypothetical gene effects outlined in table 1. The heavy solid line represents simulation 1 with uninformative sibling pairs removed (missing); the solid line represents simulation 1 with uninformative sibling pairs assigned expected values (expected); the heavy dotted line represents simulation 2, missing; and the dotted line represents simulation 2, expected. B, Five simulations assuming the same genetic model (equivalent to simulation 2 in table 1) and showing variation in LOD score because of the random sampling of uninformative sibling pairs.
Figure  6
Figure 6
A, Number of affected sibling pairs with at least one of the three allele-sharing probabilities (i.e., sharing 0, 1, or 2 alleles) IBD greater than 0.95 (solid line), 0.75 (dashed line), and 0.5 (dotted line) for the autism data on chromosome 2. The horizontal line corresponds to the number of sibling pairs in the study. B, Affected sibling pair t statistic values assessing the departure of the average formula image value from the “no linkage” null hypothesis value for all sibling pairs (dashed line), sibling pairs with formula image variance values ⩽ 0.10 (dotted line), and sibling pairs with formula image variance values ⩽0.05 (solid line) for chromosome 2.

Comment in

Similar articles

See all similar articles

Cited by 14 articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback