Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 12 (2), 339-48

A Nomenclature System for the Tree of Human Y-chromosomal Binary Haplogroups

A Nomenclature System for the Tree of Human Y-chromosomal Binary Haplogroups

Y Chromosome Consortium. Genome Res.


The Y chromosome contains the largest nonrecombining block in the human genome. By virtue of its many polymorphisms, it is now the most informative haplotyping system, with applications in evolutionary studies, forensics, medical genetics, and genealogical reconstruction. However, the emergence of several unrelated and nonsystematic nomenclatures for Y-chromosomal binary haplogroups is an increasing source of confusion. To resolve this issue, 245 markers were genotyped in a globally representative set of samples, 74 of which were males from the Y Chromosome Consortium cell line repository. A single most parsimonious phylogeny was constructed for the 153 binary haplogroups observed. A simple set of rules was developed to unambiguously label the different clades nested within this tree. This hierarchical nomenclature system supersedes and unifies past nomenclatures and allows the inclusion of additional mutations and haplogroups yet to be discovered.


Figure 1
Figure 1
The single most parsimonious tree of 153 haplogroups (left) showing correspondences with prior nomenclatures (right). The root of the tree is denoted with an arrow. Haplogroup names and Y Chromosome Consortium (YCC) sample numbers are given at the tips of the tree, and major clades are labeled with large capital letters and shaded in color (the entire cladogram is designated haplogroup Y). The “*” symbol indicates an internal node on the tree or paragroup (see text). For space reasons, subclade labels are entered to the left of the corresponding links. Mutation names are given along the branches; major clades are labeled with a larger font than are their subclades. The length of each branch is not proportional to the number of mutations or the age of the mutation; each subclade is given a unit of depth in the tree. Some of the branches were elongated artificially to make room for a number of phylogenetically equivalent markers on a single branch. The order of phylogenetically equivalent markers shown on each branch is arbitrary. Prior nomenclatures are named according to author and are taken from the following publications: (α) Jobling and Tyler-Smith (2000) and Kaladjieva et al. (2001); (β) Underhill et al. (2000); (γ) Hammer et al. (2001); (δ) Karafet et al. (2001); (ε) Semino et al. (2000); (ζ) Su et al. (1999); and (η) Capelli et al. (2001). Noncontiguous naming systems in prior nomenclatures result either from the use of non-PCR markers that have not been typed on the YCC panel or unpublished lineage definitions. Prior haplogroup names shown in red are found in more than one position in the phylogeny. Cross-hatching within the “Semino” nomenclature indicates lineages that cannot be named according to their system. Mutations M104 and P22 on lineage M2 are independent discoveries of the same polymorphic marker.
Figure 2
Figure 2
Potential examples of revisions in topology necessitated by the discovery of new mutations and new samples with intermediate haplogroups. Haplogroup nomenclature systems are shown to the right of the tree. (A) The G and H haplogroups are as shown in Figure 1. (B) Case of a newly discovered marker that joins haplogroups within haplogroup G. (C) Newly discovered mutation (μ) that splits clades within haplogroup G. (D) Case of a newly discovered sample with the derived state at M52 and the ancestral state at M69. Names shown in boxes indicate haplogroup names that require changes from those shown in A. Dotted lines indicate newly created lineages.
Figure 3
Figure 3
Examples of haplogroup names for cases in which subsets of markers in Figure 1 are genotyped. Markers that were not genotyped are shown with a strikethrough. The lineage- and mutation-based full nomenclature systems are shown to the right of the tree.

Comment in

  • Why Names
    F Calafell et al. Genome Res 12 (2), 219-21. PMID 11827941. - Review

Similar articles

See all similar articles

Cited by 208 PubMed Central articles

See all "Cited by" articles

Publication types


LinkOut - more resources