Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 8 (10), e75397

Indian Signatures in the Westernmost Edge of the European Romani Diaspora: New Insight From Mitogenomes


Indian Signatures in the Westernmost Edge of the European Romani Diaspora: New Insight From Mitogenomes

Alberto Gómez-Carballa et al. PLoS One.


In agreement with historical documentation, several genetic studies have revealed ancestral links between the European Romani and India. The entire mitochondrial DNA (mtDNA) of 27 Spanish Romani was sequenced in order to shed further light on the origins of this population. The data were analyzed together with a large published dataset (mainly hypervariable region I [HVS-I] haplotypes) of Romani (N=1,353) and non-Romani worldwide populations (N>150,000). Analysis of mitogenomes allowed the characterization of various Romani-specific clades. M5a1b1a1 is the most distinctive European Romani haplogroup; it is present in all Romani groups at variable frequencies (with only sporadic findings in non-Romani) and represents 18% of their mtDNA pool. Its phylogeographic features indicate that M5a1b1a1 originated 1.5 thousand years ago (kya; 95% CI: 1.3-1.8) in a proto-Romani population living in Northwest India. U3 represents the most characteristic Romani haplogroup of European/Near Eastern origin (12.4%); it appears at dissimilar frequencies across the continent (Iberia: ≈ 31%; Eastern/Central Europe: ≈ 13%). All U3 mitogenomes of our Iberian Romani sample fall within a new sub-clade, U3b1c, which can be dated to 0.5 kya (95% CI: 0.3-0.7); therefore, signaling a lower bound for the founder event that followed admixture in Europe/Near East. Other minor European/Near Eastern haplogroups (e.g. H24, H88a) were also assimilated into the Romani by introgression with neighboring populations during their diaspora into Europe; yet some show a differentiation from the phylogenetically closest non-Romani counterpart. The phylogeny of Romani mitogenomes shows clear signatures of low effective population sizes and founder effects. Overall, these results are in good agreement with historical documentation, suggesting that cultural identity and relative isolation have allowed the Romani to preserve a distinctive mtDNA heritage, with some features linking them unequivocally to their ancestral Indian homeland.

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.


Figure 1
Figure 1. Analysis of IBS carried out on Romani and non-Romani individuals based on genome-wide SNP data.
(A) IBS values between Romani and non-Romani individuals. The non-Romani population sample was taken from the same Spanish region where the Romani samples were collected for the present study. The pink dots correspond to the two pairs of Romani individuals showing much higher IBS values than those observed between other Romani or non-Romani individuals. (B) All pairs of IBS values between Romani and non-Romani individuals are sorted from the lowest to the highest; the two highest values on the right of the figure correspond to the two pairs of Romani individuals in Figure 1A.
Figure 2
Figure 2. Maximum parsimony tree of haplogroup M5 Romani mitogenomes.
The inset map shows the geographic location and sample size of all the M5 genomes observed in India subcontinent. The position of the revised Cambridge reference sequence (rCRS) is indicated for reading sequence motifs . Mitochondrial DNA variants are indicated along the branches of the phylogenetic tree. An asterisk (*) as prefix indicates a position located in an overlapping region shared by two mtDNA genes. Mutations are transitions unless a suffix A, C, G, or T indicates a transversion. Other possible suffixes indicate insertions (+), synonymous substitution (s), mutational changes in tRNA (-t), mutational change in rRNA (-r), non-coding variant located in the mtDNA coding region (-nc) and an amino acid replacement (indicated in round brackets). Variants underlined represent recurrent mutations in this tree while a prefix ‘@’ indicates a back mutation. Mutational hotspot variants at positions 16182, 16183, and 16519, as well as variation around position 310 and length or point heteroplasmies were not considered for the phylogenetic reconstruction. The numbers in small squares attached to the haplogroup labels indicate the number of occurrences (mitogenomes) of the corresponding haplogroups found in public databases; the color of the squares indicates their geographic origin according to the legend inset. Spanish Romani complete genomes obtained in this study are indicated with yellow circles. More details on the geographic or ethnic origin of all the mitogenomes used in this network are provided in Table S1. The Indian M5a1b1a genome (FJ383591) seems to belong to M5a1b1a, but note that it lacks four diagnostic sites, most likely due to sequencing or documentation errors –.
Figure 3
Figure 3. Maximum parsimony tree of M5a1b1a1 HVS-I sequences.
Population codes are as follows: ALB = Albania; BOS = Bosnia; BRA = Brazil; BUL = Bulgaria; CRO = Croatia; CRZ = Czech Republic; CUB = Cuba; FIN = Finland; FRA = France; GER = Germany; GRE = Greece; HUN = Hungary; IND = India; ITA = Italy; LIT = Lithuania; PAK = Pakistan; POL = Poland; POR = Portugal; RUS = Russia; SLO = Slovakia; SPA = Spain; USA = United States of America. See Table S3 for detailed geographic information on these haplotypes. See caption to Figure 2 for more information on the features of the tree.
Figure 4
Figure 4. Map showing the frequency of haplogroup M5a1b1a1 control region sequences (pie charts) in different European Romani groups.
The inset map represents this clade as ultimately originated in India; the numbers in the green circles represent the occurrences of M5a1b1a in non-Romani individuals in Eurasia (see Table S3 for references): 24 incidences in Europe and 9 incidences in India. References for the European Romani groups (red squares in the map) are as follows: 1 = Bulgaria , ; 2 = Croatia ; 3 = Hungary ; 4 = Lithuania ; 5 = Poland ; 6 = Slovakia ; 7 = Portugal ; 8 = Málaga (Southern Spain) ; 9 = Madrid (Central Spain) ; 10 = Barcelona (Northeastern Spain) .
Figure 5
Figure 5. Maximum parsimony tree of haplogroup M35 mitogenomes.
The inset map shows the geographic location and sample size of all the M5 genomes observed in the Indian subcontinent. See caption to Figure 2 for more information on the features of the tree.
Figure 6
Figure 6. Maximum parsimony tree of the Spanish Romani mitogenomes analyzed in the present study excluding those belonging to haplogroup M5 ( Figure 2 ), and U3 ( Figure 7 ).
See the caption to Figure 2 for more information on the features of the tree.
Figure 7
Figure 7. Maximum parsimony tree of haplogroup U3 mitogenomes.
See caption to Figure 2 for more information on the features of the tree.
Figure 8
Figure 8. Mitochondrial DNA haplogroup frequencies.
(A) European Romani populations; (B) Iberian Romani; (C) European Romani excluding those from Iberia. Note that HV(×H) represents all haplogroups within HV excluding the H branch; L represents all mtDNA clades excluding macro-haplogroups M and N; and the category ‘other’ represents a paragroup that includes all of the haplotypes that could not be unambiguously assigned to any of the other categories considered in the figure.

Similar articles

See all similar articles

Cited by 9 articles

See all "Cited by" articles


    1. Liégois JP (2007) Roms en Europe: Éditions du Conseil de l'Europe.
    1. Ioviţă R, Schurr TG (2004) Reconstructing the origins and migrations of diasporic populations: the case of the European gypsies. Am Anthropol 106: 267–281.
    1. Fraser A (1995) The gypsies; Wiley-Blackwell, editor. Oxford UK: Blackwell Publishers.
    1. Turner RL (1984) The position of Romani in Indo-Aryan: Monographs.
    1. Kalaydjieva L, Gresham D, Calafell F (2001) Genetic studies of the Roma (Gypsies): a review. BMC Med Genet 2: 5. - PMC - PubMed

Publication types


Grant support

The research leading to these results has received funding from the “Ministerio de Ciencia e Innovación” (SAF2008-02971) and from the Plan Galego IDT, Xunta de Galicia (EM 2012/045) (A.S.) and Consellería de Sanidade/Xunta de Galicia (RHI07/2-intensificación actividad investigadora and 10PXIB918184PR), Instituto Carlos III (Intensificación de la actividad investigadora) and Fondo de Investigación Sanitaria (FIS; PI070069 and PI1000540) del plan nacional de I+D+I and ‘fondos FEDER’ (F.M.T.). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.