Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 475 (7357), 493-6

Inference of Human Population History From Individual Whole-Genome Sequences


Inference of Human Population History From Individual Whole-Genome Sequences

Heng Li et al. Nature.


The history of human population size is important for understanding human evolution. Various studies have found evidence for a founder event (bottleneck) in East Asian and European populations, associated with the human dispersal out-of-Africa event around 60 thousand years (kyr) ago. However, these studies have had to assume simplified demographic models with few parameters, and they do not provide a precise date for the start and stop times of the bottleneck. Here, with fewer assumptions on population size changes, we present a more detailed history of human population sizes between approximately ten thousand and a million years ago, using the pairwise sequentially Markovian coalescent model applied to the complete diploid genome sequences of a Chinese male (YH), a Korean male (SJK), three European individuals (J. C. Venter, NA12891 and NA12878 (ref. 9)) and two Yoruba males (NA18507 (ref. 10) and NA19239). We infer that European and Chinese populations had very similar population-size histories before 10-20 kyr ago. Both populations experienced a severe bottleneck 10-60 kyr ago, whereas African populations experienced a milder bottleneck from which they recovered earlier. All three populations have an elevated effective population size between 60 and 250 kyr ago, possibly due to population substructure. We also infer that the differentiation of genetically modern humans may have started as early as 100-120 kyr ago, but considerable genetic exchanges may still have occurred until 20-40 kyr ago.


Figure 1
Figure 1. Illustration of the PSMC model and its application to simulated data
(a) The PSMC infers the local time to the most recent common ancestor (TMRCA) based on the local density of heterozygotes, using a Hidden Markov Model, where the observation is a diploid sequence, the hidden states are discretized TMRCA and the transitions represent ancestral recombination events. (b) We used the ms software to simulate the TMRCA relating the two alleles of an individual across a 200kb region (the thick red line), and inferred the local TMRCA at each locus using the PSMC (the heat map). The inference usually includes the correct time, with the greatest errors at transition points.
Figure 2
Figure 2. PSMC estimate on simulated data
(a) PSMC estimate on data simulated by msHOT. The blue curve is the population size history used in simulation; the red curve is the PSMC estimate on the originally simulated sequence; the 100 thin green curves are the PSMC estimates on 100 sequences randomly resampled from the original sequence. (b) PSMC estimate on data with variable mutation rate or with hotspots.
Figure 3
Figure 3. PSMC estimate on real data
(a) The population sizes inferred from autosomes of six individuals. 5%, 10% and 29% of heterozygotes are assumed to be missing in CHN.A, KOR.A and EUR1.A, respectively. (b) The population sizes inferred from male-combined X chromosomes and the simulated African-Asian combined sequences from the best-fit model by Schaffner et al. Sizes inferred from X chromosome data are scaled by 4/3; the neutral mutation rate on X, which is used in time scaling, is estimated with the ratio of male-to-female mutation rate α equal to 2 (Methods).

Similar articles

See all similar articles

Cited by 514 articles

See all "Cited by" articles


References for online methods

    1. McKenna A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–303. - PMC - PubMed
    1. Miyata T, Hayashida H, Kuma K, Mitsuyasu K, Yasunaga T. Male-driven molecular evolution: a model and nucleotide sequence analysis. Cold Spring Harb Symp Quant Biol. 1987;52:863–867. - PubMed


    1. Reich DE, et al. Linkage disequilibrium in the human genome. Nature. 2001;411:199–204. - PubMed
    1. Marth GT, Czabarka E, Murvai J, Sherry ST. The allele frequency spectrum in genome-wide human variation data reveals signals of differential demographic history in three large world populations. Genetics. 2004;166:351–372. - PMC - PubMed
    1. Plagnol V, Wall JD. Possible ancestral structure in human populations. PLoS Genet. 2006;2:e105. - PMC - PubMed
    1. Keinan A, Mullikin JC, Patterson N, Reich D. Measurement of the human allele frequency spectrum demonstrates greater genetic drift in East Asians than in Europeans. Nat Genet. 2007;39:1251–1255. - PMC - PubMed
    1. Fagundes NJR, et al. Statistical evaluation of alternative models of human evolution. Proc Natl Acad Sci. 2007;104:17614–17619. - PMC - PubMed

Publication types