Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 69 (6), 1332-47

The Discovery of Single-Nucleotide Polymorphisms--And Inferences About Human Demographic History

Affiliations

The Discovery of Single-Nucleotide Polymorphisms--And Inferences About Human Demographic History

J Wakeley et al. Am J Hum Genet.

Abstract

A method of historical inference that accounts for ascertainment bias is developed and applied to single-nucleotide polymorphism (SNP) data in humans. The data consist of 84 short fragments of the genome that were selected, from three recent SNP surveys, to contain at least two polymorphisms in their respective ascertainment samples and that were then fully resequenced in 47 globally distributed individuals. Ascertainment bias is the deviation, from what would be observed in a random sample, caused either by discovery of polymorphisms in small samples or by locus selection based on levels or patterns of polymorphism. The three SNP surveys from which the present data were derived differ both in their protocols for ascertainment and in the size of the samples used for discovery. We implemented a Monte Carlo maximum-likelihood method to fit a subdivided-population model that includes a possible change in effective size at some time in the past. Incorrectly assuming that ascertainment bias does not exist causes errors in inference, affecting both estimates of migration rates and historical changes in size. Migration rates are overestimated when ascertainment bias is ignored. However, the direction of error in inferences about changes in effective population size (whether the population is inferred to be shrinking or growing) depends on whether either the numbers of SNPs per fragment or the SNP-allele frequencies are analyzed. We use the abbreviation "SDL," for "SNP-discovered locus," in recognition of the genomic-discovery context of SNPs. When ascertainment bias is modeled fully, both the number of SNPs per SDL and their allele frequencies support a scenario of growth in effective size in the context of a subdivided population. If subdivision is ignored, however, the hypothesis of constant effective population size cannot be rejected. An important conclusion of this work is that, in demographic or other studies, SNP data are useful only to the extent that their ascertainment can be modeled.

Figures

Figure  1
Figure 1
Example genealogy, drawn with branch lengths equal to the coalescent expectations, which shows the structure of the data analyzed here: “A,” “D,” and “O” are, respectively, samples that are only in the ascertainment set, samples that are only in the data set, and “overlap” samples (i.e., those which are in both the data set and the ascertainment set). Three types of branches are distinguished, corresponding to the three kinds of observable polymorphisms discussed in the text.
Figure  2
Figure 2
Distribution of Tajima’s (1989) D among SDLs, in each of the three data sets.
Figure  3
Figure 3
Expected numbers of SNPs segregating in different frequencies, in a sample of size nD+nO=10, relative to the number of singleton polymorphisms; results are averages, over 100,000 simulated data sets, for a 400-bp-long SDL, with θ=.0005 per base pair. a, Effect of requiring an SDL to have at least one SNP in the first nO samples drawn from the population. b, Effect of separating SDLs into classes with different numbers of SNPs, with nD=0.
Figure  4
Figure 4
Coefficient of variation of S, in a sample of size nD+nO=10; results are averages, over 100,000 simulated data sets, for a 400-bp-long SDL, with θ=.0005 per base pair. a, Effect of requiring SDLs to have at least k SNPs, under the assumption nD=0. b, Effect of requiring an SDL to have at least one SNP that must be segregating in the first nO samples drawn from the population.
Figure  5
Figure 5
Estimates of 2Nm, for data set 2, both when ascertainment is ignored and when it is modeled. For this data set, five demes had infinite-migration-rate estimates when ascertainment was ignored; these five demes are not plotted.
Figure  6
Figure 6
Likelihood surfaces for Q and T, based on the distribution of nD and nO for each of the three data sets, when ascertainment bias is ignored (a) and when it is modeled (b).
Figure  7
Figure 7
Combined likelihood surfaces for Q and T, based on the distribution of nD and nO for all three data sets, when ascertainment bias is ignored (a) and when it is modeled (b).
Figure  8
Figure 8
Likelihood surfaces for Q and T, based on the allele frequencies at data-only and overlap SNPs, conditioned on their numbers, for each of the three data sets, when ascertainment bias is ignored (a) and when it is modeled (b).
Figure  9
Figure 9
Combined likelihood surfaces for Q and T, for all the data, (a) when the population is assumed to be panmictic and (b) fitting the subdivided-population model described in the text.

Similar articles

See all similar articles

Cited by 52 articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback