Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 5

Admixture Into and Within sub-Saharan Africa

Collaborators, Affiliations

Admixture Into and Within sub-Saharan Africa

George Bj Busby et al. Elife.

Abstract

Similarity between two individuals in the combination of genetic markers along their chromosomes indicates shared ancestry and can be used to identify historical connections between different population groups due to admixture. We use a genome-wide, haplotype-based, analysis to characterise the structure of genetic diversity and gene-flow in a collection of 48 sub-Saharan African groups. We show that coastal populations experienced an influx of Eurasian haplotypes over the last 7000 years, and that Eastern and Southern Niger-Congo speaking groups share ancestry with Central West Africans as a result of recent population expansions. In fact, most sub-Saharan populations share ancestry with groups from outside of their current geographic region as a result of gene-flow within the last 4000 years. Our in-depth analysis provides insight into haplotype sharing across different ethno-linguistic groups and the recent movement of alleles into new environments, both of which are relevant to studies of genetic epidemiology.

Keywords: Africa; admixture; chromosome painting; evolutionary biology; gene-flow; genomics; human.

Conflict of interest statement

The authors declare that no competing interests exist.

Figures

Figure 1.
Figure 1.. Sub-Saharan African genetic variation is shaped by ethno-linguistic and geographical similarity.
(A) the origin of the 46 African ethnic groups used in the analysis; ethnic groups from the same country are given the same colour, but different shapes; the legend describes the identity of each point. Figure 1—figure supplement 1 and Figure 1—source data 1 provide further detail on the provenance of these samples. (B) PCA shows that the first major axis of variation in Africa (PC1, y-axis) splits southern groups from the rest of Africa, each symbol represents an individual; PC2 (x-axis) reflects ethno-linguistic differences, with Niger-Congo speakers split from Afroasiatic and Nilo-Saharan speakers. Tick marks here and in (C) show the scale. (C) The third principle component (PC3, x-axis) represents geographical separation of Niger-Congo speakers, forming a cline from west to east Africans (D) results of the fineSTRUCTURE clustering analysis using copying vectors generated from chromosome painting; each row of the heatmap is a recipient copying vector showing the number of chunks shared between the recipient and every individual as a donor (columns);the tree clusters individuals with similar copying vectors together, such that block-like patterns are observed on the heat map; darker colours on the heatmap represent more haplotype sharing (see text for details); individual tips of the tree are coloured by country of origin, and the seven ancestry regions are identified and labelled to the left of the tree; labels in parentheses describe the major linguistic type of the ethnic groups within: AA = Afroasiatic, KS = Khoesan, NC = Niger-Congo, NS = Nilo-Saharan. DOI: http://dx.doi.org/10.7554/eLife.15266.003
Figure 1—figure supplement 1.
Figure 1—figure supplement 1.. Map of populations used in the analysis.
Population names are coloured by the country of origin; positions of the countries are shown on the map. Individual point labels, which are used throughout this paper, are shown for each population in the legend. Sample provenance is shown immediately after the population name in circular parentheses and final number of individuals is shown in square parentheses. DOI: http://dx.doi.org/10.7554/eLife.15266.005
Figure 1—figure supplement 2.
Figure 1—figure supplement 2.. An example of hierarchical clustering to chose two groups of similar individuals from the Fula based on a PCA of The Gambia.
Projected onto a PCA of Gambian genetic variation where each point represents an individual, all Fula individuals are coloured, with the colour depicting their cluster assignment, based on the MClust clustering algorithm. We chose individuals from the green (D) and light blue (E) clusters to maximise the representation of Fula genetic variation. Note that the majority of the individuals from the other 6 Gambian ethnic groups occur in the right arm of the PCA. An analogous process was performed for all ethnic groups from the MalariaGEN dataset where more than 50 individuals were available. DOI: http://dx.doi.org/10.7554/eLife.15266.006
Figure 1—figure supplement 3.
Figure 1—figure supplement 3.. fineSTRUCTURE analysis of the full dataset.
We show the tree output from a single run of the fineSTRUCTURE algorithm. To aid reading, the tree has been split in two, East and Southern African groups are on the left, West and Central West African groups are on the right. Leaves are labelled by the identity of the individuals within them, with the total number of individuals in the clusters shown in parentheses. Leaves are coloured by the country of origin (as in Figure 1—figure supplement 1) and branches are coloured by the final ancestry region that the clusters were assigned to. Note that although Malawi and Cameroon individuals were located in a clade with mostly East African individuals, they were assigned to Southern and Central West African ancestry regions, respectively. Clades containing outlying individuals from the Fula and Mandinka are also shown. DOI: http://dx.doi.org/10.7554/eLife.15266.007
Figure 2.
Figure 2.. Haplotypes capture more population structure than independent loci.
(A) For each population pair, we estimated pairwise FST (upper right triangle) using 328,000 independent SNPs, and TVD (lower left triangle) using population averaged copying vectors from CHROMOPAINTER. TVD measures the difference between two copying vectors. (B) Comparison of pairwise FST and TVD shows that they are not linearly related: some population pairs have low FST and high TVD. (Source data is detailed in Figure 2—source data 2 to Figure 2—source data 1). DOI: http://dx.doi.org/10.7554/eLife.15266.008
Figure 2—figure supplement 1.
Figure 2—figure supplement 1.. Haplotypic analysis of populations from the Central West Africa ancestry region accesses fine-scale population differentiation.
Here we show a comparison of principal components analysis (PCA), which uses genotype data, with fineSTRUCTURE, which uses haplotypic information in the form of painted chromosomes. The five plots in the top panel show the results of the main PCA based on genotype data. Symbols represent individuals and are detailed in the legend. PC3 differentiates the Yoruba from other groups in the region, but individuals from Central West Africa overlap at the remaining PCs, suggesting close genealogical relationships between individuals. The lower panel shows the results of the chromosome painting analysis, which we used with fineSTRUCTURE, where all individuals were allowed to copy from all other individuals. The rows of the heatmap represent un-normalised, individual copying vectors, with the mean number of chunks copied from each donor region as columns. Subtle differences in the copying of individuals from each of the five Central West African groups can be seen, which fineSTRUCTURE uses to cluster individuals into four clusters. We show a close-up of the fineSTRUCTURE tree from Figure 1 on the right of the bottom panel. Each group separates into its own cluster, with the exception of two of the groups from Ghana, the Kasem and Namkam, which are put in the same fineSTRUCTURE cluster. DOI: http://dx.doi.org/10.7554/eLife.15266.013
Figure 3.
Figure 3.. Inference of admixture in sub-Saharan Africa using MALDER.
We used MALDER to identify the evidence for multiple waves of admixture in each population. (A) For each population, we show the ancestry region identity of the two populations involved in generating the MALDER curves with the greatest amplitudes (coloured blocks) for at most two events. The major contributing sources are highlighted with a black box. Populations are ordered by ancestry of the admixture sources and dates estimates which are shown ± 1.96 × s.e. For each event we compared the MALDER curves with the greatest amplitude to other curves involving populations from different ancestry regions. In the central panel, for each source, we highlight the ancestry regions providing curves that are not significantly different from the best curves. In the Jola, for example, this analysis shows that, although the curve with the greatest amplitude is given by Khoesan (green) and Eurasian (dark yellow) populations, curves containing populations from any other African group (apart from Afroasiatic) in place of a Khoesan population are not significantly smaller than this best curve (SOURCE 1). Conversely, when comparing curves where a Eurasian population is substituted with a population from another group, all curve amplitudes are significantly smaller (Z<2). (B) Comparison of dates of admixture ± 1.96 × s.e. for MALDER dates inferred using the HAPMAP recombination map and a recombination map inferred from European (CEU) individuals from Hinch et al. (2011). We only show comparisons for dates where the same number of events were inferred using both methods. Point symbols refer to populations and are as in Figure 1. (C) as (B) but comparison uses an African (YRI) map. Source data can be found in Figure 3—source data 1. DOI: http://dx.doi.org/10.7554/eLife.15266.014
Figure 3—figure supplement 1.
Figure 3—figure supplement 1.. Weighted LD amplitudes for a selection of 9 ethnic groups.
For a given test population we show the amplitude (± 1 s.e.) computed using a test population and every other population as the second reference. Plotted are the fitted amplitudes for each set of curves with the population used labelled beneath, with populations ordered by amplitude. A large number of population showed a similar profile to (A), that is, with Eurasian populations showing the highest amplitudes. Other populations, e.g. Malawi, obtained the largest amplitudes from an African population. DOI: http://dx.doi.org/10.7554/eLife.15266.019
Figure 3—figure supplement 2.
Figure 3—figure supplement 2.. Comparison of weighted LD amplitude scores across all African ethnic groups.
For a given test population we computed the ALDER amplitude (y-axis intercept) using the test population and every other population as the second reference. We then ranked the amplitudes across a given test population: populations who gave the top-ranked (i.e. largest) amplitude are in green, with those beneath a rank 15 shown in grey. This analysis shows that for many populations the reference populations giving the largest amplitudes (i.e. have the highest rank) are often non-African groups. DOI: http://dx.doi.org/10.7554/eLife.15266.020
Figure 3—figure supplement 3.
Figure 3—figure supplement 3.. Comparison of the minimum distance to begin computing admixture LD.
For each of the 48 African populations as a target, we used ALDER to compute the minimum distance over which short-range LD is shared with each of the 47 other African and 12 Eurasian reference populations. Here we show boxplots showing the distribution of minimum inferred genetic distances (y-axis) over which LD is shared for each of the reference populations separately (x-axis). We performed two analyses using weighted LD, one using these values of the minimum distance inferred from the data, and another where this distance was forced to be 0.5cM (dotted red line). Across all African populations we observe LD correlations with other African populations at genetic distances > 0.5cM, with median values ranging between 0.7cM when GUMUZ is used as a reference to 1.4cM when FULAII is used as a reference. In fact, when we further explore the range of these values across each region separately (Figure 3—figure supplement 3), we note that, as expected, these distances are greater between more closely related groups. DOI: http://dx.doi.org/10.7554/eLife.15266.021
Figure 3—figure supplement 4.
Figure 3—figure supplement 4.. Comparison of the minimum distance to begin computing admixture LD split by region.
As in Figure 3—figure supplement 3 except distances are stratified by region. The median minimum distance that all sub-Saharan African populations have correlated LD is always greater than 0.5cM. Taken together with the results described in Figure 3—figure supplement 4, this suggests that all African populations share some LD over short genetic distances, that may be the result of shared demography or admixture. (Note that ALDER computes LD correlations at distances <2cM.) DOI: http://dx.doi.org/10.7554/eLife.15266.022
Figure 3—figure supplement 5.
Figure 3—figure supplement 5.. Results of MALDER for all populations using a European specific recombination map.
We used MALDER to identify the evidence for multiple waves of admixture in each population. (A) For each population, we show the ancestry region identity of the two populations involved in generating the MALDER curves with the greatest amplitudes (which are the closest to the true admixing sources amongst the reference populations) for at most two events. The sources generating the greatest amplitude are highlighted with a black box. Populations are ordered by ancestry of the admixture sources and dates estimates which are shown ± 1 s.e. (B) Comparison of dates of admixture ± 1 s.e. for MALDER dates inferred using the HAPMAP recombination map and a recombination map inferred from European (CEU) individuals from Hinch et al. (2011). We only show comparisons for dates where the same number of events were inferred using both methods. Point symbols refer to populations and are as in Figure 1. (C) as (B) but comparing with an African (YRI) map. DOI: http://dx.doi.org/10.7554/eLife.15266.023
Figure 3—figure supplement 6.
Figure 3—figure supplement 6.. Results of the MALDER analysis computing weighted admixture decay curves from 0.5cM.
As in the main analyses, the algorithm was run independently three times with the HAPMAP, YRI, and CEU genetic maps. The main results shown here are from the HAPMAP analysis. For each population, we show the ancestry region identity of the two populations involved in generating the MALDER curves with the greatest amplitudes (which are the closest to the true admixing sources amongst the reference populations) for at most two events. The sources generating the greatest amplitude are highlighted with a black box. Populations are ordered by ancestry of the admixture sources and dates estimates which are shown ± 1 s.e. (B) Comparison of dates of admixture ± 1 s.e. for MALDER dates inferred using the HAPMAP recombination map and a recombination map inferred from European (CEU) individuals from (Hinch et al., 2011). We only show comparisons for dates where the same number of events were inferred using both methods. Point symbols refer to populations and are as in Figure 1. (C) as (B) but comparing with an African (YRI) map. DOI: http://dx.doi.org/10.7554/eLife.15266.024
Figure 3—figure supplement 7.
Figure 3—figure supplement 7.. Results of MALDER for all populations using an African specific recombination map.
We used MALDER to identify the evidence for multiple waves of admixture in each population. (A) For each population, we show the ancestry region identity of the two populations involved in generating the MALDER curves with the greatest amplitudes (which are the closest to the true admixing sources amongst the reference populations) for at most two events. The sources generating the greatest amplitude are highlighted with a black box. Populations are ordered by ancestry of the admixture sources and dates estimates which are shown ± 1 s.e. (B) Comparison of dates of admixture ± 1 s.e. for MALDER dates inferred using the HAPMAP recombination map and a recombination map inferred from European (CEU) individuals from Hinch et al. (2011). We only show comparisons for dates where the same number of events were inferred using both methods. Point symbols refer to populations and are as in Figure 1. (C) as (B) but comparing with an African (YRI) map. DOI: http://dx.doi.org/10.7554/eLife.15266.025
Figure 4.
Figure 4.. Inference of admixture in sub-Saharan African using GLOBETROTTER.
(A) For each group we show the ancestry region identity of the best matching source for the first and, if applicable, second events. Events involving sources that most closely match FULAI and SEMI-BANTU are highlighted by golden and red colours, respectively. Second events can be either multiway, in which case there is a single date estimate, or two-date in which case 2ND EVENT refers to the earlier event. The point estimate of the admixture date is shown as a black point, with 95% CI shown with lines. MIXTURE MODEL: We infer the ancestry composition of each African group by fitting its copying vector as a mixture of all other population copying vectors. The coefficients of this regression sum to 1 and are coloured by ancestry region. 1ST EVENT SOURCES and 2ND EVENT SOURCES show the ancestry breakdown of the admixture sources inferred by GLOBETROTTER, coloured by ancestry region as in the key top right. (B) and (C) Comparisons of dates inferred by MALDER and GLOBETROTTER. Because the two methods sometimes inferred different numbers of events, in (B) we show the comparison based on the inferred number of events in the MALDER analysis, and in (C) for the number of events inferred by GLOBETROTTER. Point symbols refer to populations and are as in Figure 1 and source data can be found in Figure 4—source data 1. DOI: http://dx.doi.org/10.7554/eLife.15266.026
Figure 4—figure supplement 1.
Figure 4—figure supplement 1.. Admixture source inference by GLOBETROTTER after sequentially removing local surrogates from the analysis.
In addition to the Full analysis, we show the inferred composition of admixture sources for different, restricted surrogate analyses. Components and y-axis labels are coloured by ancestry region. In each case we show admixture sources inferred by GLOBETROTTER for a single date of admixture. DOI: http://dx.doi.org/10.7554/eLife.15266.029
Figure 4—figure supplement 2.
Figure 4—figure supplement 2.. Admixture source inference by GLOBETROTTER after sequentially removing local surrogates from the analysis.
The results are the same as Figure 4—figure supplement 1, but only Niger-Congo speaking groups are coloured. We highlight Malawi components in black, and Cameroon (Bantu and Semi-Bantu) in red. DOI: http://dx.doi.org/10.7554/eLife.15266.030
Figure 5.
Figure 5.. A timeline of recent admixture in sub-Saharan Africa.
For all events involving recipient groups from each ancestry region (columns) we combine all date bootstrap estimates generated by GLOBETROTTER and show the densities of these dates separately for the minor (above line) and major (below line) sources of admixture. Dates are additionally stratified by the ancestry region of the surrogate populations (rows), with all dates involving Niger Congo speaking regions combined together (All Niger Congo). Within each panel, the densities are coloured by the ancestry region origin of the surrogates, and in proportion to the components of admixture involved in the admixture event. The integrals of the densities are proportional to the admixture proportions of the events contributing to them. DOI: http://dx.doi.org/10.7554/eLife.15266.031
Figure 6.
Figure 6.. The geography of recent gene-flow in Africa.
We summarise gene-flow events in Africa using the results of the GLOBETROTTER analysis. For each ethnic group, we inferred the composition of the admixture sources, and link recipient population to surrogates using arrows, the width of which is proportional to the amount it contributes to the admixture event. We separately plot (A) all events involving admixture source components from the Bantu and Semi-Bantu ethnic groups in Cameroon; (B) all events involving admixture sources from East and Southern African Niger-Congo speaking groups; (C) events involving admixture sources from West African Niger-Congo and East African Nilo-Saharan / Afroasiatic groups; (D) all events involving components from Eurasia. in (D) arrows are linked to the labelled 1KGP Eurasian groups. Arrows are coloured by country of origin, as in Figure 1—figure supplement 1. Numbers 1–8 in circles represent the events highlighted in section A haplotype-based model of gene-flow in sub-Saharan Africa. An alternative version of this plot, stratified by date, is shown in Figure 6—figure supplement 1. DOI: http://dx.doi.org/10.7554/eLife.15266.032
Figure 6—figure supplement 1.
Figure 6—figure supplement 1.. Gene-flow in Africa over the last 2000 years.
Using the results of the GLOBETROTTER analysis we show the connections between different groups in sub-Saharan Africa over time. For each population, we inferred the date of admixture and the composition of the admixing sources. We link each recipient population to its donor components using arrows, the size of which is proportional to the amount it contributes to the admixture event. Arrows are coloured by country of origin, as in Figure 4 in the main text. DOI: http://dx.doi.org/10.7554/eLife.15266.033

Similar articles

See all similar articles

Cited by 29 PubMed Central articles

See all "Cited by" articles

References

    1. 1000 Genomes Project Consortium. Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491: 56–65. doi: 10.1038/nature11632. - DOI - PMC - PubMed
    1. Allen JDV. Swahili Origins: Swahili Culture & the Shungwaya Phenomenon. James Currey Publishers; 1993.
    1. Ansari Pour N, Plaster CA, Bradman N. Evidence from Y-chromosome analysis for a late exclusively eastern expansion of the Bantu-speaking people. European Journal of Human Genetics. 2013;21:423–429. doi: 10.1038/ejhg.2012.176. - DOI - PMC - PubMed
    1. Band G, Le QS, Jostins L, Pirinen M, Kivinen K, Jallow M, Sisay-Joof F, Bojang K, Pinder M, Sirugo G, Conway DJ, Nyirongo V, Kachala D, Molyneux M, Taylor T, Ndila C, Peshu N, Marsh K, Williams TN, Alcock D, Andrews R, Edkins S, Gray E, Hubbart C, Jeffreys A, Rowlands K, Schuldt K, Clark TG, Small KS, Teo YY, Kwiatkowski DP, Rockett KA, Barrett JC, Spencer CC, Malaria Genomic Epidemiology Network Imputation-based meta-analysis of severe malaria in three African populations. PLoS Genetics. 2013;9:e15266 doi: 10.1371/journal.pgen.1003509. - DOI - PMC - PubMed
    1. Barham L, Mitchell P. The First Africans: African Archaeology From the Earliest Toolmakers to Most Recent Foragers (1st ed) Cambridge, UK: Cambridge University Press; 2008. - DOI

Publication types

Feedback