Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Oct 13;538(7624):238-242.
doi: 10.1038/nature19792. Epub 2016 Sep 21.

Genomic Analyses Inform on Migration Events During the Peopling of Eurasia

Luca Pagani #  1   2   3 Daniel John Lawson #  4 Evelyn Jagoda #  2   5 Alexander Mörseburg #  2 Anders Eriksson #  6   7 Mario Mitt  8   9 Florian Clemente  2   10 Georgi Hudjashov  1   11   12 Michael DeGiorgio  13 Lauri Saag  1 Jeffrey D Wall  14 Alexia Cardona  2   15 Reedik Mägi  8 Melissa A Wilson Sayres  16   17 Sarah Kaewert  2 Charlotte Inchley  2 Christiana L Scheib  2 Mari Järve  1 Monika Karmin  1   18 Guy S Jacobs  19   20 Tiago Antao  21 Florin Mircea Iliescu  2 Alena Kushniarevich  1   22 Qasim Ayub  23 Chris Tyler-Smith  23 Yali Xue  23 Bayazit Yunusbayev  1   24 Kristiina Tambets  1 Chandana Basu Mallick  1 Lehti Saag  18 Elvira Pocheshkhova  25 George Andriadze  26 Craig Muller  27 Michael C Westaway  28 David M Lambert  28 Grigor Zoraqi  29 Shahlo Turdikulova  30 Dilbar Dalimova  31 Zhaxylyk Sabitov  32 Gazi Nurun Nahar Sultana  33 Joseph Lachance  34   35 Sarah Tishkoff  36 Kuvat Momynaliev  37 Jainagul Isakova  38 Larisa D Damba  39 Marina Gubina  39 Pagbajabyn Nymadawa  40 Irina Evseeva  41   42 Lubov Atramentova  43 Olga Utevska  43 François-Xavier Ricaut  44 Nicolas Brucato  44 Herawati Sudoyo  45 Thierry Letellier  44 Murray P Cox  12 Nikolay A Barashkov  46   47 Vedrana Skaro  48   49 Lejla Mulahasanovic  50 Dragan Primorac  51   52   53   49 Hovhannes Sahakyan  1   54 Maru Mormina  55 Christina A Eichstaedt  2   56 Daria V Lichman  39   57 Syafiq Abdullah  58 Gyaneshwer Chaubey  1 Joseph T S Wee  59 Evelin Mihailov  8 Alexandra Karunas  24   60 Sergei Litvinov  24   60   1 Rita Khusainova  24   60 Natalya Ekomasova  60 Vita Akhmetova  24 Irina Khidiyatova  24   60 Damir Marjanović  61   62 Levon Yepiskoposyan  54 Doron M Behar  1 Elena Balanovska  63 Andres Metspalu  7   8 Miroslava Derenko  64 Boris Malyarchuk  64 Mikhail Voevoda  65   39   57 Sardana A Fedorova  47   46 Ludmila P Osipova  39   57 Marta Mirazón Lahr  66 Pascale Gerbault  67 Matthew Leavesley  68   69 Andrea Bamberg Migliano  70 Michael Petraglia  71 Oleg Balanovsky  72   63 Elza K Khusnutdinova  24   60 Ene Metspalu  1   18 Mark G Thomas  67 Andrea Manica  7 Rasmus Nielsen  73 Richard Villems #  1   18   74 Eske Willerslev #  27 Toomas Kivisild #  2   1 Mait Metspalu #  1
Free PMC article

Genomic Analyses Inform on Migration Events During the Peopling of Eurasia

Luca Pagani et al. Nature. .
Free PMC article


High-coverage whole-genome sequence studies have so far focused on a limited number of geographically restricted populations, or been targeted at specific diseases, such as cancer. Nevertheless, the availability of high-resolution genomic data has led to the development of new methodologies for inferring population history and refuelled the debate on the mutation rate in humans. Here we present the Estonian Biocentre Human Genome Diversity Panel (EGDP), a dataset of 483 high-coverage human genomes from 148 populations worldwide, including 379 new genomes from 125 populations, which we group into diversity and selection sets. We analyse this dataset to refine estimates of continent-wide patterns of heterozygosity, long- and short-distance gene flow, archaic admixture, and changes in effective population size through time as well as for signals of positive or balancing selection. We find a genetic signature in present-day Papuans that suggests that at least 2% of their genome originates from an early and largely extinct expansion of anatomically modern humans (AMHs) out of Africa. Together with evidence from the western Asian fossil record, and admixture between AMHs and Neanderthals predating the main Eurasian expansion, our results contribute to the mounting evidence for the presence of AMHs out of Africa earlier than 75,000 years ago.

Conflict of interest statement

The authors declare no competing financial interests.


ED1. Sample Diversity and Archaic signals.
A: Map of location of samples highlighting the Diversity/Selection Sets; B: ADMIXTURE plot (K=8 and 14) which relates general visual inspection of genetic structure to studied populations and their region of origin; C: Sample level heterozygosity is plotted against distance from Addis Ababa. The trend line represents only non-African samples. The inset shows the waypoints used to arrive at the distance in kilometres for each sample. D: Boxplots were used to visualize the Denisova (red), Altai (green) and Croatian Neanderthal (blue) D distribution for each regional group of samples. Oceanian Altai D values show a remarkable similarity with the Denisova D values for the same region, in contrast with the other groups of samples where the Altai boxplots tend to be more similar to the Croatian Neanderthal ones.
ED2. Data quality checks and heterozygosity patterns.
Concordance of DNA sequencing (Complete Genomics Inc.) and DNA genotyping (Illumina genotyping arrays) data (ref-ref; het-ref-alt and hom-alt-alt, see SI 1.6) from chip (A) and sequence data (B). Coverage (depth) distribution of variable positions, divided by DNA source (Blood or Saliva) and Complete Genomic calling pipeline (release version) (C). Genome-wide distribution of Transition/Transversion ratio subdivided by DNA source (Saliva or Blood) and by Complete Genomic calling pipeline (D). Genome-wide distribution of Transition/Transversion ratio subdivided by chromosomes (E). Inter-chromosome differences in observed heterozygosity in 447 samples from the Diversity Set (F). Inter-chromosome differences in observed heterozygosity in a set of 50 unpublished genomes from the Estonian Genome Center, sequenced on an Illumina platform at an average coverage exceeding 30x (G). Inter-chromosome differences in observed heterozygosity in the phase 3 of the 1000 Genomes Project (H). The total number of observed heterozygous sites was divided by the number of accessible basepairs reported by the 1000 Genomes Project.
ED3. FineSTRUCTURE shared ancestry analysis.
ChromoPainter and FineSTRUCTURE results, showing both inferred populations with the underlying (averaged) number of haplotypes that an individual in a population receives (rows) from donor individuals in other populations (columns). 108 populations are inferred by FineSTRUCTURE. The dendrogram shows the inferred relationship between populations. The numbers on the dendrogram give the proportion of MCMC iterations for which each population split is observed (where this is less than 1). Each “geographical region” has a unique colour from which individuals are labeled. The number of individuals in each population is given in the label; e.g. “4Italians; 3Albanians” is a population of size 7 containing 4 individuals from Italy and 3 from Albania.
ED4. MSMC genetic split times and outgroup f3 results.
The MSMC split times estimated between each sample and a reference panel of 9 genomes were linearly interpolated to infer the broader square matrix (A). Summary of outgroup f3 statistics for each pair of non-African populations (B) or to an ancient sample (C) using Yoruba as an outgroup. Populations are grouped by geographic region and are ordered with increasing distance from Africa (left to right for columns and bottom to top for rows). Colour bars at the left and top of the heat map indicate the colour coding used for the geographical region. Individual population labels are indicated at the right and bottom of the heat map. The f3 statistics are scaled to lie between 0 and 1, with a black colour indicating those close to 0 and a red colour indicating those close to 1. Let m and M be the minimum and maximum f3 values within a given row (i.e., focal population). That is, for focal population X (on rows), m = minY,Y≠X f3(X, Y ; Yoruba) and M = maxY,Y≠X f3(X, Y ; Yoruba). The scaled f3 statistic for a given cell in that row is given by f3scaled=(f3-m)/(M-m), so that the smallest f3 in the row has value f3scaled=0 (black) and the largest has value f3scaled=1 (red). By default, the diagonal has value f3scaled=1 (red). The heat map is therefore asymmetric, with the population closest to the focal population at a given row having value f3scaled=1 (red colour) and the population farthest from the focal population at a given row having value f3scaled=0 (black colour). Therefore, at a given row, scanning the columns of the heat map reveals the populations with the most shared ancestry with the focal population of that row in the heat map.
ED5. Geographical patterns of genetic diversity.
Isolation by distance pattern across areas of high genetic gradient, using Europe as a baseline. The samples used in each analysis are indicated by coloured lines on the maps to the right of each plot. The panels show FST as a function of distance across the Himalayas (A), the Ural mountains (B), and the Caucasus (C) as reported on the color-coded map (D). Effect of creating gaps in the samples in Europe (E): we tested the effect of removing samples from stripes, either north to south (F) or west to east (G), to create gaps comparable in size to the gaps in samples in the dataset. Effective migration surfaces inferred by EEMS (H).
ED6. Summary of positive selection results
Barplot comparing frequency distributions of functional variants in Africans and non-Africans (A). The distribution of exonic SNPs according to their functional impact (synonymous, missense and nonsense) as a function of allele frequency. Note that the data from both groups was normalised for a sample size of n=21 and that the Africans show significantly (Chisq p-value <10-15) more rare variants across all sites classes. Result (B) of 1000 bootstrap replica of the Rxy test for a subset of pigmentation genes highlighted by GWAS (n=32). The horizontal line provides the African reference (x=1) against which all other groups are compared. The blue and red marks show the 95th and the 5th percentile of the bootstrap distributions respectively. If the 95th percentile is below 1, then the population shows a significant excess of missense variants in the pigmentation subset relative to the Africans. Note that this is the case for all non-Africans except the Oceanians. Pools (C) of individuals for selection scans. fineSTRUCTURE based coancestry matrix was used to define twelve groups of populations for the downstream selection scans. These groups are highlighted in the plot by boxes with broken line edges. The number of individuals in each group is reported in Table SI2:3.2-I.
ED7. Length of haplotypes assigned as African by fineSTRUCTURE as a function of genome proportion.
A: 447 Diversity Panel results, showing label averages (large crosses) along with individuals (small dots). B: Relative excluded Diversity Panel results, to check for whether including related individuals affects African genome fraction. Individuals that shared more than 2% of genome fraction were forbidden from receiving haplotypess from each other, and the painting was re-run on a large subset of the genome (all ROH regions from any individual). C: ROH only African haplotypes. To guard against phasing errors, we analysed only regions for which an individual was in a long (>500kb) Run of Homozygosity using the PLINK command “--homozyg-window-kb 500000 --homozyg-window-het 0 --homozyg-density 10”. Because there are so few such regions, we report only the population average for populations with two or more individuals, as well as the standard error in that estimate. Populations for whom the 95% CI passed 0 were also excluded. Note the logarithmic axis. D: Ancient DNA panel results. We used a different panel of 109 individuals which included 3 ancient genomes. We painted Chromosomes 11, 21 & 22 and report as crosses the population averages for populations with 2 or more individuals. The solid thin lines represent the position of each population when modern samples only are analysed. The dashed lines lead off the figure to the position of the ancient hominins and the African samples.
ED8. MSMC Linear behavior of MSMC split estimates in presence of admixture.
The examined Central Asian (A), East African (B), and African-American (C) genomes yielded a signature of MSMC split time (Truth, left-most column) that could be recapitulated (Reconstruction, second left most column) as a linear mixture of other MSMC split times. The admixture proportions inferred by our method (top of each admixture component column) were remarkably similar to the ones previously reported from the literature. MSMC split times (D) calculated after re-phasing an Estonian and a Papuan (Koinanbe) genome together with all the available West African and Pygmy genomes from our dataset to minimize putative phasing artefacts. The cross coalescence rate curves reported here are quantitatively comparable with the ones of Figure 2 A, hence showing that phasing artefacts are unlikely to explain the observed past-ward shift of the Papuan-African split time. Boxplot (E) showing the distribution of differences between African-Papuan and African-Eurasian split times obtained from coalescent simulations assembled through random replacement to make 2000 sets of 6 individuals (to match the 6 Papuans available from our empirical dataset), each made of 1.5 Gb of sequence. The simulation command line used to generate each chromosome made of 5Mb was as follows, being *DIV*=0.064; 0.4 or 0.8 for the xOoA, Denisova (Den) and Divergent Denisova (DeepDen) cases, respectively: ms0ancient2 10 1 .065 .05 -t 5000. -r 3000. 5000000 -I 7 1 1 1 1 2 2 2 -en 0. 1 .2 -en 0. 2 .2 -en 0. 3 .2 -en 0. 4 .2 -es .025 7 .96 -en .025 8 .2 -ej .03 7 6 -ej .04 6 5 -ej .060 8 3 -ej .061 4 3 -ej .062 2 1 -ej .063 3 1 -ej *DIV* 1 5
ED9. Modelling the xOoA components with FineSTRUCTURE.
A: Joint distribution of haplotype lengths and Derived allele count, showing the median position of each cluster and all haplotypes assigned to it in the Maximum A Posteriori (MAP) estimate. Note that although a different proportion of points is assigned to each in the MAP, the total posterior is very close to 1/K for all. The dashed lines show a constant mutation rate. Haplotypes are ordered by mutation rate from low to high. B: Residual distribution comparison between the two component mixture using EUR.AFR and EUR.PNG (left), and the three component mixture including xOoA (using the same colour scale) (right). The residuals without xOoA are larger (RMSE 0.0055 compared to RMSE 0.0018) but more importantly, they are also structured. C: Assuming a mutational clock and a correct assignment of haplotypes, we can estimate the relative age of the splits from the number of derived alleles observed on the haplotypes. This leads to an estimate of 1.5 times older for xOoA compared to the Eurasian-Africa split.
ED10. Proposed xOoA model.
A subway map figure illustrating, as suggested by the novel results presented here, a model of an early, extinct Out-of-Africa (xOoA) signature in the genomes of Sahul populations at their arrival in the region. Given the overall small genomic contribution of this event to the genomes of modern Sahul individuals, we could not determine whether the documented Denisova admixture (question marks) and putative multiple Neanderthal admixtures took place along this extinct OoA. We also speculate (question mark) people who migrated along the xOoA route may have left a trace in the genomes of the Altai Neanderthal as reported by Kuhlwilm and colleagues.
Figure 1
Figure 1. Genetic barriers across space.
Spatial visualisation of genetic barriers inferred from genome-wide genetic distances, quantified as the magnitude of the gradient of spatially interpolated allele frequencies (value denoted by colour bar; grey areas have been land during the last glacial maximum but are currently under water). Here we used a spatial kernel smoothing method based on the matrix of pairwise average heterozygosity a matlab script that plots the hexagons of the grid with a colour coding to represent gradients Inset: partial correlation between magnitude of genetic gradients and combinations of different geographic factors, elevation (E), temperature (T) and precipitation (P), for genetic gradients from fineSTRUCTURE (red) and allele frequencies (blue). This analysis (SI1:2.2.2 for details) shows that genetic differences within this region display some correlation with physical barriers such as mountain ranges, deserts, forests, and open water (such as the Wallace line).
Figure 2
Figure 2. Evidence of an xOoA signature in the genomes of modern Papuans.
Panel A: MSMC split times plot. The Yoruba-Eurasia split curve shows the mean of all Eurasian genomes against one Yoruba genome. The grey area represents top and bottom 5% of runs. We chose a Koinanbe genome as representative of the Sahul populations. Panels B-D: Decomposition of Papuan haplotypes inferred as African by fineSTRUCTURE. Panel B: Semi-parametric decomposition of the joint distribution of haplotype lengths and non-African derived allele rate per SNP, showing the relative proportion of haplotypes in K=20 components of the distribution, ordered by non-African derived allele rate, relative to the overall proportion of haplotypes in each component. The four datasets produced by considering haplotypes inferred as (African/Denisova) in (Europeans/Papuans) are shown with our inferred "extra Out-of-Africa xOoA" component. Panel C: The properties of the components in terms of non-African derived allele rate, on which the components are ordered, and length. Panel D: The reconstruction of haplotypes inferred as African in the genomes of Papuan individuals, using a mixture of all other data (red) and with the addition of the xOoA signature (black).

Comment in

Similar articles

  • A genomic history of Aboriginal Australia.
    Malaspinas AS, Westaway MC, Muller C, Sousa VC, Lao O, Alves I, Bergström A, Athanasiadis G, Cheng JY, Crawford JE, Heupink TH, Macholdt E, Peischl S, Rasmussen S, Schiffels S, Subramanian S, Wright JL, Albrechtsen A, Barbieri C, Dupanloup I, Eriksson A, Margaryan A, Moltke I, Pugach I, Korneliussen TS, Levkivskyi IP, Moreno-Mayar JV, Ni S, Racimo F, Sikora M, Xue Y, Aghakhanian FA, Brucato N, Brunak S, Campos PF, Clark W, Ellingvåg S, Fourmile G, Gerbault P, Injie D, Koki G, Leavesley M, Logan B, Lynch A, Matisoo-Smith EA, McAllister PJ, Mentzer AJ, Metspalu M, Migliano AB, Murgha L, Phipps ME, Pomat W, Reynolds D, Ricaut FX, Siba P, Thomas MG, Wales T, Wall CM, Oppenheimer SJ, Tyler-Smith C, Durbin R, Dortch J, Manica A, Schierup MH, Foley RA, Lahr MM, Bowern C, Wall JD, Mailund T, Stoneking M, Nielsen R, Sandhu MS, Excoffier L, Lambert DM, Willerslev E. Malaspinas AS, et al. Nature. 2016 Oct 13;538(7624):207-214. doi: 10.1038/nature18299. Epub 2016 Sep 21. Nature. 2016. PMID: 27654914
  • The Simons Genome Diversity Project: 300 genomes from 142 diverse populations.
    Mallick S, Li H, Lipson M, Mathieson I, Gymrek M, Racimo F, Zhao M, Chennagiri N, Nordenfelt S, Tandon A, Skoglund P, Lazaridis I, Sankararaman S, Fu Q, Rohland N, Renaud G, Erlich Y, Willems T, Gallo C, Spence JP, Song YS, Poletti G, Balloux F, van Driem G, de Knijff P, Romero IG, Jha AR, Behar DM, Bravi CM, Capelli C, Hervig T, Moreno-Estrada A, Posukh OL, Balanovska E, Balanovsky O, Karachanak-Yankova S, Sahakyan H, Toncheva D, Yepiskoposyan L, Tyler-Smith C, Xue Y, Abdullah MS, Ruiz-Linares A, Beall CM, Di Rienzo A, Jeong C, Starikovskaya EB, Metspalu E, Parik J, Villems R, Henn BM, Hodoglugil U, Mahley R, Sajantila A, Stamatoyannopoulos G, Wee JT, Khusainova R, Khusnutdinova E, Litvinov S, Ayodo G, Comas D, Hammer MF, Kivisild T, Klitz W, Winkler CA, Labuda D, Bamshad M, Jorde LB, Tishkoff SA, Watkins WS, Metspalu M, Dryomov S, Sukernik R, Singh L, Thangaraj K, Pääbo S, Kelso J, Patterson N, Reich D. Mallick S, et al. Nature. 2016 Oct 13;538(7624):201-206. doi: 10.1038/nature18964. Epub 2016 Sep 21. Nature. 2016. PMID: 27654912 Free PMC article.
  • Chad Genetic Diversity Reveals an African History Marked by Multiple Holocene Eurasian Migrations.
    Haber M, Mezzavilla M, Bergström A, Prado-Martinez J, Hallast P, Saif-Ali R, Al-Habori M, Dedoussis G, Zeggini E, Blue-Smith J, Wells RS, Xue Y, Zalloua PA, Tyler-Smith C. Haber M, et al. Am J Hum Genet. 2016 Dec 1;99(6):1316-1324. doi: 10.1016/j.ajhg.2016.10.012. Epub 2016 Nov 23. Am J Hum Genet. 2016. PMID: 27889059 Free PMC article.
  • Tracing the peopling of the world through genomics.
    Nielsen R, Akey JM, Jakobsson M, Pritchard JK, Tishkoff S, Willerslev E. Nielsen R, et al. Nature. 2017 Jan 18;541(7637):302-310. doi: 10.1038/nature21347. Nature. 2017. PMID: 28102248 Free PMC article. Review.
  • Archaic human genomics.
    Disotell TR. Disotell TR. Am J Phys Anthropol. 2012;149 Suppl 55:24-39. doi: 10.1002/ajpa.22159. Epub 2012 Nov 2. Am J Phys Anthropol. 2012. PMID: 23124308 Review.
See all similar articles

Cited by 70 articles

See all "Cited by" articles


    1. Drmanac R, et al. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science. 2010;327:78–81. doi: 10.1126/science.1181498. - DOI - PubMed
    1. Lachance J, et al. Evolutionary history and adaptation from high-coverage whole-genome sequences of diverse African hunter-gatherers. Cell. 2012;150:457–469. doi: 10.1016/j.cell.2012.07.009. - DOI - PMC - PubMed
    1. Pagani L, et al. Tracing the Route of Modern Humans out of Africa by Using 225 Human Genome Sequences from Ethiopians and Egyptians. American journal of human genetics. 2015;96:986–991. doi: 10.1016/j.ajhg.2015.04.019. - DOI - PMC - PubMed
    1. Clemente FJ, et al. A Selective Sweep on a Deleterious Mutation in CPT1A in Arctic Populations. American journal of human genetics. 2014;95:584–589. doi: 10.1016/j.ajhg.2014.09.016. - DOI - PMC - PubMed
    1. Gudbjartsson DF, et al. Large-scale whole-genome sequencing of the Icelandic population. Nat Genet. 2015;47:435–444. doi: 10.1038/ng.3247. - DOI - PubMed

Publication types