Human Population History Revealed by a Supertree Approach


Pavel Duda et al. Sci Rep.


Over the past two decades numerous new trees of modern human populations have been published extensively but little attention has been paid to formal phylogenetic synthesis. We utilized the "matrix representation with parsimony" (MRP) method to infer a composite phylogeny (supertree) of modern human populations, based on 257 genetic/genomic, as well as linguistic, phylogenetic trees and 44 admixture plots from 200 published studies (1990-2014). The resulting supertree topology includes the most basal position of S African Khoisan followed by C African Pygmies, and the paraphyletic section of all other sub-Saharan peoples. The sub-Saharan African section is basal to the monophyletic clade consisting of the N African-W Eurasian assemblage and the consistently monophyletic Eastern superclade (Sahul-Oceanian, E Asian, and Beringian-American peoples). This topology, dominated by genetic data, is well-resolved and robust to parameter set changes, with a few unstable areas (e.g., West Eurasia, Sahul-Melanesia) reflecting the existing phylogenetic controversies. A few populations were identified as highly unstable "wildcard taxa" (e.g. Andamanese, Malagasy). The linguistic classification fits rather poorly on the supertree topology, supporting a view that direct coevolution between genes and languages is far from universal.


(a) Semistrict consensus supertree of 186 human populations (outgroups not shown) based on the representative dataset and parameter set 1.A of the sensitivity analysis (all data partitions were weighted equally and all sources were considered rooted). SOUTH BANTU = Ndebele + Swati + Xhosa + Zulu (often occurred as a composite population in the source trees); AUSTRALIAN consists of Australian Aboriginal populations of unspecified ethnic origin. The color code corresponds to the recovered monophyletic or paraphyletic groups of populations. The wildcard taxa (Qatari, Andamanese, Malagasy, Dayak Ngaju) are displayed (in gray) in the most basal of all positions they acquired when included into the dataset, but were not taken into account when assessing node and group support. The circles indicate presence of the nodes in the strict (white) and semistrict (gray) consensus of 16 supertrees derived from the sensitivity analysis (a circle is absent if the respective node is absent even in the semistrict consensus). The analysis space plots (square grids) describe presence of the selected clades/groups in the supertree under individual parameter sets as either: a monophyletic clade (white); a paraphyletic group or an unresolved section compatible with monophyly or paraphyly (gray); a polyphyletic assemblage (black). Completely white grids (=the group present under all parameter sets) are substituted by small white squares. (b, c) Alternative topology for the N African–W Eurasian assemblage and the Sahul–Oceanian clade as recovered in parameter set 2.C. (d) Alternative topology for the E Asia clade as recovered in parameter sets 4.A–4.D. The nodes where the alternative topologies (b, c, d) begin in the supertree 1.A (a) are denoted by asterisks. (e) Geographic locations of 186 human populations plotted on the world map using QGIS v.2.8 (the color code corresponds to the trees).
(a) Semistrict consensus supertree of 55 human populations based on HGDP dataset and parameter set 1.A of the sensitivity analysis. Populations were renamed to correspond to those used in the HGDP panel. BANTU = Kikuyu; POLYNESIAN = Samoan + Maori; MICRONESIAN = Kosraean; MELANESIAN = Naasioi; PAPUAN = Goroka; COLOMBIAN = Piapoco + Curripaco. The color code corresponds to Fig. 1. (b) Frequency-differences consensus of 14 supertrees based on parameter sets 1.B–4.C of the sensitivity analysis. (c) Semistrict consensus supertree based on parameter set 4.D of the sensitivity analysis. The geographic color code corresponds to Fig. 1.
Figure 3. The supertree constrained by Ethnologue classification.
White circles indicate topological constraints. Grey circles indicate an unconstrained taxon or clade (usually a language isolate) recovered within a constrained one. The geographic color code corresponds to Fig. 1.

