Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 139 (1), 45-59

A Different View on Fine-Scale Population Structure in Western African Populations

Affiliations

A Different View on Fine-Scale Population Structure in Western African Populations

Kridsadakorn Chaichoompu et al. Hum Genet.

Abstract

Due to its long genetic evolutionary history, Africans exhibit more genetic variation than any other population in the world. Their genetic diversity further lends itself to subdivisions of Africans into groups of individuals with a genetic similarity of varying degrees of granularity. It remains challenging to detect fine-scale structure in a computationally efficient and meaningful way. In this paper, we present a proof-of-concept of a novel fine-scale population structure detection tool with Western African samples. These samples consist of 1396 individuals from 25 ethnic groups (two groups are African American descendants). The strategy is based on a recently developed tool called IPCAPS. IPCAPS, or Iterative Pruning to CApture Population Structure, is a genetic divisive clustering strategy that enhances iterative pruning PCA, is robust to outliers and does not require a priori computation of haplotypes. Our strategy identified in total 12 groups and 6 groups were revealed as fine-scale structure detected in the samples from Cameroon, Gambia, Mali, Southwest USA, and Barbados. Our finding helped to explain evolutionary processes in the analyzed West African samples and raise awareness for fine-scale structure resolution when conducting genome-wide association and interaction studies.

Conflict of interest statement

The authors declare that they have no conflict of interest.

Figures

Fig. 1
Fig. 1
Geographical location of the African data set analyzed in this work. Abbreviations identify the following populations: ACB African Caribbean in Barbados, ASW African ancestry in Southwest USA, BGM Gamache in Burkina Faso, BGR Gurunsi in Burkina Faso, BM1 Mossi I in Burkina Faso, BM2 Mossi II in Burkina Faso, CBT Bantu in Cameroon, CSB Semi-Bantu in Cameroon, ESN Esan in Nigeria, GF1 Fula I in Gambia, GF2 Fula II in Gambia, GJL Jola in Gambia, GMD Mandinka II in Gambia, GMJ Manjago in Gambia, GNA Akans in Ghana, GNK Kasem in Ghana, GNN Nankam in Ghana, GSH Serehule in Gambia, GSR Serere in Gambia, GWD Gambian in Western Division, Mandinka, GWL Wollof in Gambia, MLB Bambara in Mali, MLM Malinke in Mali, MSL Mende in Sierra Leone, YRI Yoruba in Ibadan, Nigeria
Fig. 2
Fig. 2
The first three principal components of the entire African data set before IPCAPS clustering. Highlighted points refer to ethnic groups
Fig. 3
Fig. 3
Bubble plot of the IPCAPS clusters that shows the distribution of how cluster members are composed
Fig. 4
Fig. 4
a ADMIXTURE clustering of the African data set. The numbers of ancestry groups (K) are between 3 and 5. The numbers (1–12) under the ADMIXTURE plot represent the IPCAPS groups. The group members are listed underneath the plot; the numbers in parentheses represent the numbers of individuals from those ethnic groups. b Geographic map showing, for each group, the geographic origin for the majority (less than five individuals) of group members. c Cross-validation (cv) error from ADMIXTURE based on tenfold cross-validation
Fig. 5
Fig. 5
Concordance analysis between IPCAPS and fineSTRUCTURE. The dendrogram represents the identified groups by fineSTRUCTURE. These groups were uniquely matched to the 12 groups identified by IPCAPS; differences between the matched groups are indicated taking IPCAPS groups as reference
Fig. 6
Fig. 6
Tissue specificity related to the differentially expressed genes (DEG) derived from the top-FST SNPs (99.9th percentile) across all cluster comparisons. A distinction is made between upregulated DEG (top), downregulated DEG (middle), and bidirectional DEG (bottom). The p values represent the probability from the hypergeometric test
Fig. 7
Fig. 7
The lists of genes that are associated with the top-FST SNPs (99.9th percentile) between groups 2 and 3, groups 8 and 9, and groups 10 and 11, is shown in a, b, and c, respectively, obtained from FUMA. The listed genes (orange) from genome-wide association studies obtained from the GWAS Catalog (Buniello et al. 2019). The proportions of overlapping genes in gene sets are shown in red, and the enrichment p values are shown in blue. The lists of GWAS experiments were filtered by enrichment p value ≥ 0.0001 (− log10(0.0001) = 4)

Similar articles

See all similar articles

Cited by 1 article

References

    1. Abegaz F, Chaichoompu K, Génin E, et al. Principals about principal components in statistical genetics. Brief Bioinform. 2018 doi: 10.1093/bib/bby081. - DOI - PubMed
    1. Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19:1655–1664. doi: 10.1101/gr.094052.109. - DOI - PMC - PubMed
    1. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol. 1995;57:289–300. doi: 10.2307/2346101. - DOI
    1. Bhatia G, Patterson N, Sankararaman S, Price AL. Estimating and interpreting FST: the impact of rare variants. Genome Res. 2013;23:1514–1521. doi: 10.1101/gr.154831.113. - DOI - PMC - PubMed
    1. Bouaziz M, Paccard C, Guedj M, Ambroise C. SHIPS: spectral hierarchical clustering for the inference of population structure in genetic studies. PLoS One. 2012;7:e45685. doi: 10.1371/journal.pone.0045685. - DOI - PMC - PubMed

LinkOut - more resources

Feedback