Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul;89(6):396-414.
doi: 10.1007/s00239-021-10012-6. Epub 2021 Jun 7.

Dynamic Molecular Evolution of Mammalian Homeobox Genes: Duplication, Loss, Divergence and Gene Conversion Sculpt PRD Class Repertoires

Affiliations

Dynamic Molecular Evolution of Mammalian Homeobox Genes: Duplication, Loss, Divergence and Gene Conversion Sculpt PRD Class Repertoires

Thomas D Lewin et al. J Mol Evol. 2021 Jul.

Abstract

The majority of homeobox genes are highly conserved across animals, but the eutherian-specific ETCHbox genes, embryonically expressed and highly divergent duplicates of CRX, are a notable exception. Here we compare the ETCHbox genes of 34 mammalian species, uncovering dynamic patterns of gene loss and tandem duplication, including the presence of a large tandem array of LEUTX loci in the genome of the European rabbit (Oryctolagus cuniculus). Despite extensive gene gain and loss, all sampled species possess at least two ETCHbox genes, suggesting their collective role is indispensable. We find evidence for positive selection and show that TPRX1 and TPRX2 have been the subject of repeated gene conversion across the Boreoeutheria, homogenising their sequences and preventing divergence, especially in the homeobox region. Together, these results are consistent with a model where mammalian ETCHbox genes are dynamic in evolution due to functional overlap, yet have collective indispensable roles.

Keywords: Etchbox; Genome evolution; Homeodomain; Positive selection; Tandem duplication.

PubMed Disclaimer

Conflict of interest statement

The authors have no conflict of interests to declare.

Figures

Fig. 1
Fig. 1
ETCHbox repertoires of Homo sapiens (humans) and Bos taurus (cattle), with gene structures as determined using transcriptome assemblies. Horizontal grey bars represent chromosomes, vertical black bars represent the genomic position of ETCHbox genes. For gene structure representations, coding regions are shown in black, homeoboxes in colour. Untranslated regions (UTRs) are not shown. Black scale bars at 3’ end of genes = 100 bp. DPRX, LEUTX, TPRX1, TPRX2 (and B. taurus TPRX3) form a loose cluster on a single chromosome (B. taurus chromosome 18, H. sapiens chromosome 19); ARGFX has translocated to another chromosome (B. taurus chromosome 1, H. sapiens chromosome 3). TPRX1 and TPRX2 are located either side of the ETCHbox ‘ancestor’ CRX
Fig. 2
Fig. 2
Bayesian gene tree of putatively functional ETCHbox genes identified in this work. Colours highlight ETCHbox gene families; labels show posterior probabilities. The ARGFX, DPRX, LEUTX and PARGFX clades are supported by ≥ 99% probabilities. Due to the limited length of the homeodomain (60 amino acids), gene phylogenies do not always recapitulate known relationships between species. TPRX duplicates in Cetartiodactyla are referred to as TPRX3. The TPRX1 and TPRX2 genes of Mus musculus and Peromyscus leucopus are referred to as Crxos and Obox, respectively, reflecting their extensive sequence change compared to the ancestral TPRX genes. Abbreviations: Bind = Bos indicus, Bmus = Balaenoptera musculus, Btau = Bos taurus, Ccan = Castor canadensis, Ccri = Condylura cristata, Cfer = Camelus ferus, Cjac = Callithrix jacchus, Clf = Canis lupus familiaris, Ecab = Equus caballus, Fcat = Felis catus, Gvar = Galeopterus variegatus, Hsap = Homo sapiens, Lcan = Lynx canadensis, Llut = Lutra lutra, Mjav = Manis javanica, Merm = Mustela erminea, Mmon = Monodon monoceros, Mmul = Macaca mulatta, Mmur = Microcebus murinus, Mmus = Mus musculus, Mmyo = Myotis myotis, Nleu = Nomascus leucogenys, Oari = Ovis aries, Ocun = Oryctolagus cuniculus, Pabe = Pongo abelii, Pdis = Phyllostomus discolor, Pleu = Peromyscus leucopus, Psin = Phocoena sinus, Rfer = Rhinolophus ferrumequinum, Sscr = Sus scrofa, Svul = Sciurus vulgaris, Tbc = Tupaia belangeri chinensis, Uame = Ursus americanus, Zcal = Zalophus californianus
Fig. 3
Fig. 3
ETCHbox gene repertoires of 34 eutherian mammals. Phylogenetic relationships are based on TimeTree (Kumar et al. 2017). Coloured boxes = putatively functional gene. Multiple boxes = gene duplicates. Black X = no gene remnants (complete gene loss). Grey boxes = putative pseudogene; grey boxes with a black question mark = complete homeodomain but subsequent frameshift or premature stop codon. White boxes with a question mark = unclear functional status due to incomplete assembly in the region. Grey triangles = tandem single exons. Brackets = polymorphism; question marks = assembly gap such that gene presence or absence cannot be determined. HD = homeodomain. Species abbreviations as in Fig. 2
Fig. 4
Fig. 4
Bayesian phylogenies of putatively functional TPRX1, TPRX2 and TPRX3 full gene sequences (a) and homeoboxes (b). Blue boxes highlight cases where conspecific TPRX1 and TPRX2 pairs are more closely related to each other than to other sequences. Pink boxes highlight cases that appear on tree b but not tree a. Blue dots mark putative gene conversion events that occurred deeper in the phylogeny. Putative pseudogenes were excluded. Labels show posterior probabilities. Species abbreviations as in Fig. 2
Fig. 5
Fig. 5
Sequence similarity between TPRX1 and TPRX2 genes within a species. Plots show the Kimura 2-parameter (K2P) distance in 50 bp sliding windows between conspecific TPRX1 and TPRX2 genes. Higher K2P values indicate more divergent sequences. Gaps in the trace indicate indels in the alignment. The black bar marked ‘Hbox’ demarcates the position of the homeobox in each alignment. For many species, the K2P values increase towards the 3’ end of the gene, suggesting that the TPRX genes have been homogenised by gene conversion less at their 3’ ends. Putative pseudogenes were excluded. Species abbreviations as in Fig. 2
Fig. 6
Fig. 6
Bayesian phylogenies inferred using partition 1 (a) and partition 2 (b) of putatively functional TPRX1 and TPRX2 genes split at the gene conversion breakpoint identified by GARD. Boxes highlight the Bovidae, Cetacea and Primates, where topology differs markedly between the two trees. For example, Tree b is consistent with a gene conversion event at the base of the Primates; Tree a has conspecific pairs of TPRX genes consistent with additional more recent gene conversion events in the ancestors of these species within the Primates. Species abbreviations as in Fig. 2
Fig. 7
Fig. 7
Models of ETCHbox homeodomain structures including sites under positive selection. Homeodomains of human ETCHbox proteins (blue) are modelled in complex with DNA (grey). Residues under positive selection are coloured red. Amino acid side chains are shown for sites under positive selection only. Letters show the identity of positively selected residues in human, numbers show their position within the homeodomain. TPRX1 and TPRX2 homeodomains are identical due to gene conversion so only one is shown

Similar articles

Cited by

References

    1. Akaike H. A new look at the statistical model identification. IEEE Trans Automat Contr. 1974 doi: 10.1109/TAC.1974.1100705. - DOI
    1. Altamirano-Torres C, Salinas-Hernández JE, Cárdenas-Chávez DL, et al. Transcription factor TFIIEβ interacts with two exposed positions in helix 2 of the Antennapedia homeodomain to control homeotic function in Drosophila. PLoS ONE. 2018 doi: 10.1371/journal.pone.0205905. - DOI - PMC - PubMed
    1. Altschul SF, Gish W, Miller W, et al. Basic local alignment search tool. J Mol Biol. 1990 doi: 10.1016/S0022-2836(05)80360-2. - DOI - PubMed
    1. Altschul SF, Madden TL, Schäffer AA, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997 doi: 10.1093/nar/25.17.3389. - DOI - PMC - PubMed
    1. Anisimova M, Nielsen R, Yang Z. Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites. Genetics. 2003 doi: 10.1017/CBO9780511808999. - DOI - PMC - PubMed

Publication types