Bayesian Phylogenetic Analysis of Semitic Languages Identifies an Early Bronze Age Origin of Semitic in the Near East


Andrew Kitchen et al. Proc Biol Sci.


The evolution of languages provides a unique opportunity to study human population history. The origin of Semitic and the nature of dispersals by Semitic-speaking populations are of great importance to our understanding of the ancient history of the Middle East and Horn of Africa. Semitic populations are associated with the oldest written languages and urban civilizations in the region, which gave rise to some of the world's first major religious and literary traditions. In this study, we employ Bayesian computational phylogenetic techniques recently developed in evolutionary biology to analyse Semitic lexical data by modelling language evolution and explicitly testing alternative hypotheses of Semitic history. We implement a relaxed linguistic clock to date language divergences and use epigraphic evidence for the sampling dates of extinct Semitic languages to calibrate the rate of language evolution. Our statistical tests of alternative Semitic histories support an initial divergence of Akkadian from ancestral Semitic over competing hypotheses (e.g. an African origin of Semitic). We estimate an Early Bronze Age origin for Semitic approximately 5750 years ago in the Levant, and further propose that contemporary Ethiosemitic languages of Africa reflect a single introduction of early Ethiosemitic from southern Arabia approximately 2800 years ago.


Figure 1
Map of Semitic languages and inferred dispersals. The locations of all languages sampled in this study, both extinct and extant, are depicted on the map. The current distribution of Ethiosemitic languages follows Bender (1971) and distribution of the remaining languages follows Hetzron (1997). The ancient distribution of extinct languages is also indicated (i.e. Akkadian, Biblical Aramaic, Ge'ez, ancient Hebrew and Ugaritic; Bender 1971; Hetzron 1997). The West Gurage (Chaha, Geto, Innemor, Mesmes and Mesqan) and East Gurage (Walani and Zway) Ethiosemitic language groups in central Ethiopia are depicted as two combined groups. The map also presents the dispersal of Semitic languages inferred from our study. An origin of Afroasiatic along the African coast of the Red Sea, supported by comparative analyses (Ehret 1995; Ehret et al. 2004), is indicated in red, although other African origins of Afroasiatic have been proposed (e.g. southwest Ethiopia; Blench 2006). The assumed location of the divergence of ancestral Semitic from Afroasiatic between the African coast of the Red Sea and the Near East is indicated in italics. Semitic dispersals are depicted by arrows coloured according to the estimated time of divergence (see coloured time scale at top of figure), and important nodes from the phylogeny (figure 2) are placed on the arrows to indicate where and when these divergences occurred.
Figure 2
Phylogeny of Semitic languages. Our phylogeny of 25 Semitic languages based on binary encoded data is presented with mean divergence times to the right of each node and 95% HPD intervals indicated by light grey bars. The scale bar along the bottom of the phylogeny presents time in YBP. Posterior probabilities of branches are printed in italics above each branch with >0.75 support. Extinct languages are underlined and all other languages are considered to evolve to the present. Subgroups of Semitic are identified by colour bars to the right of the phylogeny (purple bars, East Semitic; green bars, Central Semitic; red bars, MSA; and blue bars, Ethiosemitic) and by three boxes (West, Central and South Semitic). Important nodes are indicated by letters: A, West Semitic; B, Central Semitic; C, Ugaritic–Hebrew–Aramaic; D, Arabic; E, South Semitic; F, MSA; and G, Ethiosemitic. The dashed line leading to Arabic reflects the fact that log BF tests were equivocal in the placement of Arabic, so we placed Arabic in Central Semitic based on previous linguistic studies (e.g. Hetzron 1976; Faber 1997). The topology is rooted with Akkadian, which is preferred by our log BF analyses, and follows the constraints of the standard model.

