Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Mar;32(3):661-73.
doi: 10.1093/molbev/msu327. Epub 2014 Dec 2.

The Y-chromosome Tree Bursts Into Leaf: 13,000 High-Confidence SNPs Covering the Majority of Known Clades

Free PMC article

The Y-chromosome Tree Bursts Into Leaf: 13,000 High-Confidence SNPs Covering the Majority of Known Clades

Pille Hallast et al. Mol Biol Evol. .
Free PMC article


Many studies of human populations have used the male-specific region of the Y chromosome (MSY) as a marker, but MSY sequence variants have traditionally been subject to ascertainment bias. Also, dating of haplogroups has relied on Y-specific short tandem repeats (STRs), involving problems of mutation rate choice, and possible long-term mutation saturation. Next-generation sequencing can ascertain single nucleotide polymorphisms (SNPs) in an unbiased way, leading to phylogenies in which branch-lengths are proportional to time, and allowing the times-to-most-recent-common-ancestor (TMRCAs) of nodes to be estimated directly. Here we describe the sequencing of 3.7 Mb of MSY in each of 448 human males at a mean coverage of 51×, yielding 13,261 high-confidence SNPs, 65.9% of which are previously unreported. The resulting phylogeny covers the majority of the known clades, provides date estimates of nodes, and constitutes a robust evolutionary framework for analyzing the history of other classes of mutation. Different clades within the tree show subtle but significant differences in branch lengths to the root. We also apply a set of 23 Y-STRs to the same samples, allowing SNP- and STR-based diversity and TMRCA estimates to be systematically compared. Ongoing purifying selection is suggested by our analysis of the phylogenetic distribution of nonsynonymous variants in 15 MSY single-copy genes.

Keywords: Y-STRs; Y-chromosome phylogeny; purifying selection; single nucleotide polymorphisms; targeted resequencing.


F<sc>ig</sc>. 1.
Fig. 1.
Distribution of sequenced regions on the MSY. At the top is shown a schematic representation of the Y chromosome and the analyzed subregion, with the distribution of the ampliconic, X-transposed, X-degenerate, and heterochromatic regions indicated (Skaletsky et al. 2003). The graph shows read depth in sequenced regions (blue) and density of discovered SNPs (red). Target coordinates for bait design (bottom) are according to GRCh37. Also shown are the locations of single-copy MSY genes (Skaletsky et al. 2003; Bellott et al. 2014), as triangles pointing in the direction of transcription. TXLNGY (Putative gamma-taxilin 2) replaces the former CYorf15A and CYorf15B (Skaletsky et al. 2003).
F<sc>ig</sc>. 2.
Fig. 2.
Venn diagram showing overlap of SNPs between NGS studies of the MSY. The total number of independent SNPs across all five studies (this study plus Francalacci et al. [2013], Poznik et al. [2013], Scozzari et al. [2014], and Wei, Ayub, Chen, et al. [2013]) is 33,479.
F<sc>ig</sc>. 3.
Fig. 3.
Maximum-parsimony tree of MSY SNP haplotypes. (a) Major haplogroups are indicated by colors, and selected haplogroup-defining mutations are indicated on branches. Deep-rooting branches have been contracted for display. The colored bar to the right indicates population group of origin: ASC: Asia, Central; ASE: Asia, East; BRI: British Isles; SCA: Scandinavia; ENW: Europe, North West; ESW: Europe, South West; ESC, Europe, South Central; ESE: Europe, South East; MNE: Middle and Near East; MEX: Mexico; AUS: Australia; AFP: Africa, food-producers; AHG: Africa, hunter-gatherers. Supplementary figure S1, Supplementary Material online, gives tips labeled with individual sample names. (b) Simplified tree showing the true lengths for deep-rooting branches. Diagonal dashed lines indicate the positions of branch contractions in part (a).
F<sc>ig</sc>. 4.
Fig. 4.
Relationship between SNP- and STR-based TMRCA estimates. SNP-based node estimates are plotted against STR-based estimates for (a) 21 STRs, (b) 17 STRs, and (c) 13 STRs, here using ASD with the “ancestral haplotype” root specification. The black dashed line in each case indicates x = y. Underlying data and correlation coefficients are given in supplementary tables S6 and S7, Supplementary Material online.

Similar articles

See all similar articles

Cited by 54 articles

See all "Cited by" articles


    1. 1000 Genomes Project Consortium. Durbin RM, Abecasis GR, Altshuler DL, Auton A, Brooks LD, Gibbs RA, Hurles ME, McVean GA. A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. - PMC - PubMed
    1. Balaresque P, Bowden GR, Parkin EJ, Omran GA, Heyer E, Quintana-Murci L, Roewer L, Stoneking M, Nasidze I, Carvalho-Silva DR. Dynamic nature of the proximal AZFc region of the human Y chromosome: multiple independent deletion and duplication events revealed by microsatellite analysis. Hum Mutat. 2008;29:1171–1180. - PMC - PubMed
    1. Balaresque P, King TE, Parkin EJ, Heyer E, Carvalho-Silva D, Kraaijenbrink T, de Knijff P, Tyler-Smith C, Jobling MA. Gene conversion violates the stepwise mutation model for microsatellites in Y-chromosomal palindromic repeats. Hum Mutat. 2014;35:609–617. - PMC - PubMed
    1. Balaresque P, Parkin EJ, Roewer L, Carvalho-Silva DR, Mitchell RJ, van Oorschot RAH, Henke J, Stoneking M, Nasidze I, Wetton J. Genomic complexity of the Y-STR DYS19: inversions, deletions and founder lineages carrying duplications. Int J Legal Med. 2008;123:15–23. - PMC - PubMed
    1. Ballantyne KN, Goedbloed M, Fang R, Schaap O, Lao O, Wollstein A, Choi Y, van Duijn K, Vermeulen M, Brauer S. Mutability of Y-chromosomal microsatellites: rates, characteristics, molecular bases, and forensic implications. Am J Hum Genet. 2010;87:341–353. - PMC - PubMed

Publication types