Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
, 74 (5), 1023-34

Origin, Diffusion, and Differentiation of Y-chromosome Haplogroups E and J: Inferences on the Neolithization of Europe and Later Migratory Events in the Mediterranean Area

Affiliations
Comparative Study

Origin, Diffusion, and Differentiation of Y-chromosome Haplogroups E and J: Inferences on the Neolithization of Europe and Later Migratory Events in the Mediterranean Area

Ornella Semino et al. Am J Hum Genet.

Abstract

The phylogeography of Y-chromosome haplogroups E (Hg E) and J (Hg J) was investigated in >2400 subjects from 29 populations, mainly from Europe and the Mediterranean area but also from Africa and Asia. The observed 501 Hg E and 445 Hg J samples were subtyped using 36 binary markers and eight microsatellite loci. Spatial patterns reveal that (1). the two sister clades, J-M267 and J-M172, are distributed differentially within the Near East, North Africa, and Europe; (2). J-M267 was spread by two temporally distinct migratory episodes, the most recent one probably associated with the diffusion of Arab people; (3). E-M81 is typical of Berbers, and its presence in Iberia and Sicily is due to recent gene flow from North Africa; (4). J-M172(xM12) distribution is consistent with a Levantine/Anatolian dispersal route to southeastern Europe and may reflect the spread of Anatolian farmers; and (5). E-M78 (for which microsatellite data suggest an eastern African origin) and, to a lesser extent, J-M12(M102) lineages would trace the subsequent diffusion of people from the southern Balkans to the west. A 7%-22% contribution of Y chromosomes from Greece to southern Italy was estimated by admixture analysis.

Figures

Figure  1
Figure 1
Phylogeny and frequency distributions of Hg E and its main subclades (panels A–G). The numbering of mutations is according to the Y Chromosome Consortium (YCC) (YCC ; Jobling and Tyler-Smith 2003). To the left of the phylogeny, the ages (in 1,000 years) of the boxed mutations are reported, with their SEs (Zhivotovsky et al. 2004). Because the procedure used is based on STR data, it actually estimates the ages of STR variation observed within the corresponding haplogroup in the studied populations. With the exception of the value relative to SRY4064 mutation, which as been calculated as TD (with V0=0) between the sister clades Hg E-P2 and Hg E-M33, the other values were estimated as the average squared difference (ASD) in the number of repeats between all current chromosomes of a sample and the founder haplotype, which has an expected value μt for single-step mutations (Thomas et al. 1998) and wt for a general mutation scheme, where w is an average effective mutation rate at the loci, taken as 6.9×10-4 per 25 years (Zhivotovsky et al. 2004) (microsatellite data available on request). In some cases, because of small sample sizes or long time passed since the occurrence of the mutation, the founder haplotype could not be reliably estimated as a modal haplotype. Therefore, we constructed it from modal alleles at single loci, although this can underestimate the age if the candidate founder haplotype differs from the real one. To make the computation of the P2 and M35 ages independent from those of their most-represented subclades, the STR variation observed at only the “asterisk” lineages (e.g., E-P2*) has been used. The M35 estimate is in agreement with those of Bosch et al. (2001) and Cruciani et al. ( [in this issue]), obtained with different methods. The YAP insertion was studied as an amplified fragment-length polymorphism (Hammer and Horai 1995). The other mutations were investigated in a hierarchical order by use of the denaturing high-performance liquid chromatography (DHPLC) methodology (Underhill et al. 2001). Subhaplogroups observed in this study are illustrated by continuous lines, whereas subhaplogroups discussed elsewhere are indicated by dotted lines. For simplicity, the prefix “M” was omitted from the name of the marker mutations. Haplogroup-frequency surfaces were graphically computer reconstructed following the Kringing procedure (Delfiner 1976) by use of the Surfer System (Golden Software) and the data reported in table 1.
Figure  2
Figure 2
Phylogeny and frequency distributions of Hg J and its main subclades (panels A–F). The numbering of mutations is according to the YCC (YCC ; Jobling and Tyler-Smith 2003). To the left of the phylogeny, the ages (in 1,000 years) of the boxed mutations are reported, with their SEs (Zhivotovsky et al. 2004). With the exception of the age relative to the 12f2 mutation, which has been estimated as TD (with V0=0) between the combined data of the two sister clades Hg J-M267 and Hg J-M172, the other values have been determined as ASD, as described in figure 1. The 12f2a marker was examined as an RFLP by Southern blotting (Passarino et al. 1998); the other mutations were investigated in hierarchical order by use of DHPLC methodology (Underhill et al. 2001). Three new mutations, M327, M280, and M390, were found in this study. M327 is a T→C transition at np 404 within the STS containing mutation M92, M280 is a G→A transition at np 330 within the STS containing the mutation M67, and M390 is an A insertion after nt 175 in the STS containing the M365 mutation. Conventions used are the same as for figure 1. The frequency surfaces were drawn using the data reported in table 2 and, for Hg J (panel A), also the data from Rosser et al. (2000), Quintana-Murci et al. (2001), and Scozzari et al. (2001).
Figure  3
Figure 3
Networks of the STR haplotypes of the main subhaplogroups of Hg E. These networks were obtained by the analysis of a subset of the samples for the following microsatellites: YCAIIa, YCAIIb (Mathias et al. 1994), DYS19, DYS389, DYS390, DYS391, and DYS392 (Roewer et al. 1996). The phylogenetic relationships between the microsatellite haplotypes were determined using the program NETWORK 2.0b (Fluxus Engineering). Networks were calculated by the median-joining method (ɛ=0) (Bandelt et al. 1995), weighting the STR loci according to their relative variability in Hg E and, with the exception of E-M81, after having processed the data with the reduced-median method. Circles represent the microsatellite haplotypes. Unless otherwise indicated by a number on the pie chart, the area of the circles and the area of the sectors are proportional to the haplotype frequency in the haplogroup and in the geographic area indicated by the color. The smallest circle of each network corresponds to one Y chromosome. The shaded area in E-M78 indicates the branch characterized by the DYS392-12 allele.
Figure  4
Figure 4
Network of the STR haplotypes of the main subhaplogroups of Hg J. These networks were obtained by the analysis of a subset of the samples for the following microsatellites: YCAIIa, YCAIIb (Mathias et al. 1994), DYS388 (Thomas et al. 1999), DYS19, DYS389, DYS390, DYS391, and DYS392 (Roewer et al. 1996), by the same procedures used for Hg E (fig. 3). Apart from the YCAII system in Hg J-M267, which was considered as a stable marker in this haplogroup (see text), the STR loci were weighted according to their relative variability in Hg J. The most complex networks, J-M267* and J-M172*, were calculated by the median-joining method (ɛ=0) on the preprocessed data with the reduced-median method; the other networks were calculated by using only the reduced-median algorithm. The shaded area in J-M267* indicates the branch characterized by the YCAIIa-22/YCAIIb-22 motif. For the areas of the circles and the sectors, see figure 3. The expansion time of this branch was calculated using TD (Zhivotovsky 2001), which gives 8.7 and 4.3 ky, respectively, for the earliest and the latest bounds of the expansion time. The former estimate was calculated by using the variance in the number of repeats of the remaining six loci, assuming a variance at the beginning of population separation (V0) equal to zero, and thus gives an upper bound for the TD (Zhivotovsky 2001). The latter assumes a linear approximation of the within-population variance in repeat scores as a function of time and takes a predicted value of V0 prior to population split; because the linearity can be achieved in a case of infinite population size only and because each survived haplogroup started from one individual and could maintain small size for a long time, the linear approximation overestimates V0 and thus might be considered as a lower bound for divergence times (L.A.Z., unpublished method).

Similar articles

See all similar articles

Cited by 111 PubMed Central articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback