Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 31 (10), 2592-611

Enigmatic Orthology Relationships Between Hox Clusters of the African Butterfly Fish and Other Teleosts Following Ancient Whole-Genome Duplication


Enigmatic Orthology Relationships Between Hox Clusters of the African Butterfly Fish and Other Teleosts Following Ancient Whole-Genome Duplication

Kyle J Martin et al. Mol Biol Evol.


Numerous ancient whole-genome duplications (WGD) have occurred during eukaryote evolution. In vertebrates, duplicated developmental genes and their functional divergence have had important consequences for morphological evolution. Although two vertebrate WGD events (1R/2R) occurred over 525 Ma, we have focused on the more recent 3R or TGD (teleost genome duplication) event which occurred approximately 350 Ma in a common ancestor of over 26,000 species of teleost fishes. Through a combination of whole genome and bacterial artificial chromosome clone sequencing we characterized all Hox gene clusters of Pantodon buchholzi, a member of the early branching teleost subdivision Osteoglossomorpha. We find 45 Hox genes organized in only five clusters indicating that Pantodon has suffered more Hox cluster loss than other known species. Despite strong evidence for homology of the five Pantodon clusters to the four canonical pre-TGD vertebrate clusters (one HoxA, two HoxB, one HoxC, and one HoxD), we were unable to confidently resolve 1:1 orthology relationships between four of the Pantodon clusters and the eight post-TGD clusters of other teleosts. Phylogenetic analysis revealed that many Pantodon genes segregate outside the conventional "a" and "b" post-TGD orthology groups, that extensive topological incongruence exists between genes physically linked on a single cluster, and that signal divergence causes ambivalence in assigning 1:1 orthology in concatenated Hox cluster analyses. Out of several possible explanations for this phenomenon we favor a model which keeps with the prevailing view of a single TGD prior to teleost radiation, but which also considers the timing of diploidization after duplication, relative to speciation events. We suggest that although the duplicated hoxa clusters diploidized prior to divergence of osteoglossomorphs, the duplicated hoxb, hoxc, and hoxd clusters concluded diploidization independently in osteoglossomorphs and other teleosts. We use the term "tetralogy" to describe the homology relationship which exists between duplicated sequences which originate through a shared WGD, but which diploidize into distinct paralogs from a common allelic pool independently in two lineages following speciation.

Keywords: Pantodon; WGD; diploidization; homeobox; teleost; tetraploidy.


F<sc>ig</sc>. 1.
Fig. 1.
WGDs and the key phylogenetic position of Pantodon buchholzi. (A) Cladogram depicting chordate phylogeny and biodiversity with the position of WGD events shown (1R/2R, TGD/3R). Osteoglossomorpha (Clade 14) branched from other teleosts near the root of the Teleostei (Clade 13) and is therefore a useful group for deep genomic comparisons of the TGD event with teleost species whose genomes have been well characterized within the Clupeocephala (Clade 16). Other major chordate clades are also illustrated for perspective. 1: Cephalochordata, 2: Tunicata, 3: Craniata, 4: Agnatha, 5: Gnathostomata, 6: Chondrichthyes, 7: Osteichthyes, 8: Sarcopterygii, 9: Actinopterygii, 10: Polypteriformes, 11: Acipenseriformes, 12: Holostei, 13: Teleostei, 14: Osteoglossomorpha, 15: Elopomorpha, 16: Clupeocephala, 17: Ostarioclupeomorpha, 18: Euteleostei. Representative species with available genomic data used in our analyses are listed. (B) Adult male specimen of the African freshwater butterflyfish, P. buchholzi.
F<sc>ig</sc>. 2.
Fig. 2.
The Hox gene complement of Pantodon buchholzi. With only five clusters (hoxax, hoxbx, hoxby, hoxcx, and hoxdx) containing a total of 45 full-length protein-coding Hox genes, Pantodon possess fewer Hox gene clusters than any other teleosts known, but retain a similar total number of individual genes. In black, exons of predicted Hox genes are depicted to scale alongside conserved microRNA genes in the mir-196 and mir-10 families, flanking Evx family genes and the most proximate nonhomeobox flanking genes or pseudogenes (gray boxes) we could identify in the region we sequenced. The IDs of BAC clones sequenced to assemble each of these scaffolds noted underlying each cluster. The total sizes of the assembled loci, and size of the region limited by the most 5′- and 3′-Hox gene-coding exon for each cluster, respectively, are: hoxax (199,116/55,231 bp), hoxbx (379,217/66,149 bp), hoxby (268,422/69,795 bp), hoxcx (246,971/93,161 bp), and hoxdx (254,542/45,374 bp).
F<sc>ig</sc>. 3.
Fig. 3.
The Hox genes of Pantodon buchholzi do not reliably segregate with the post-TGD “a” and “b” orthology groups of other teleosts in individual unconstrained vertebrate Hox gene trees. Phylogenetic trees were computed using ML and Bayesian methods with the Hox gene-coding sequences of Pantodon, other teleosts (eel, zebrafish, salmon, medaka, stickleback, Tetraodon, Takifugu, and Astatotilapia), sarcopterygians (coelacanth, Xenopus, Anolis, human, and mouse), and elephant shark. Support values corresponding to clades containing Pantodon sequences are plotted as two columns above each individual orthology-informative gene in the Hox cluster schematic. The height of each column corresponds to either the bootstrap support value (left column) or the posterior probability (right column). The column color corresponds to the clade containing the Pantodon sequence. We observe that across a single cluster, Hox genes where the Pantodon sequence clusters best with clupeocephalan post-TGD “a” orthologs (red), are interleaved with those which cluster best with the post-TGD “b” orthologs (blue), or outside the combined post-TGD “a” and “b” clades with nonteleost outgroups (green). Polytomies between the Pantodon sequence, the pre-TGD outgroups, and post-TGD “a” and “b” clades occurring in the Bayesian trees are indicated with a letter “p.” Under a classic pan-teleost TGD model we would expect all Pantodon genes on each Hox cluster to segregate with either the “a” or the “b” orthology groups rather than a mixture, and no sequences which segregate best as an outgroup to the TGD node.
F<sc>ig</sc>. 4.
Fig. 4.
3D-SLRP using vertebrate whole Hox cluster concatenations reveals strong cluster-wide conflict between individual sites supporting each of three alternative hypotheses of Pantodon Hox cluster homology. Three different constrained tree topologies representing different hypotheses of Pantodon Hox cluster homology were compared. Topology A: ((post-TGD “a” + Pantodon)(post-TGD “b”))(pre-TGD outgroups), topology B:((post-TGD “a”)(post-TGD “b” + Pantodon))(pre-TGD outgroups), and topology O:((post-TGD “a”)(post-TGD “b”))(pre-TGD outgroups + Pantodon) were each compared in a pairwise fashion under a ML framework to model the support for each hypothesis of homology across each site in whole Hox cluster-concatenated alignments. (A) Schematized outline of a site-likelihood ratio plot showing regions in the graph which support each topology. Actual site-likelihood ratio plots for each Pantodon cluster are shown for hoxax (B), hoxbx (C), hoxby (D), hoxcx (E), and hoxdx (F). Each axis plots the site-wise likelihood ratio difference between one pair of competing topologies. The x axis plots the likelihood ratio between topology A and topology B (δ1), the y axis plots the likelihood ratio between topology A and topology O (δ2), and on the z axis the likelihood ratio between topology B and topology O (δ3) is plotted. Each point represents a single amino acid site. Sites are colored if the absolute magnitude of the corresponding site-likelihood ratio is more than 2 SD greater than the mean. Sites which support topology A are colored in red, sites which support topology B are blue, whereas sites which support topology O are green. Except for hoxax, which only contains sites which support topology A, there is conflicting phylogenetic signal in the Hox clusters of Pantodon which prevents unambiguous assignment to either the clupeocephalan teleost post-TGD “a” or “b” orthology groups.
F<sc>ig</sc>. 5.
Fig. 5.
The evolution of teleost Hox gene clusters following the TGD outlining the relative timing of diploidization events and the speciation of major teleost subdivisions. This model of Hox cluster evolution in teleosts illustrates the independent diploidization of Hox clusters following the TGD and the relative timing of cluster diploidization and speciation events. Each line in the phylogram represents an allele and the separation of pairs of lines accompanied with a change in color represents the completion of the diploidization of this locus. In this model, the duplicated hoxa clusters diploidize first, before the last common ancestor of all teleosts. The remaining clusters diploidize later, and independently in Clupeocephala and Osteoglossomorpha. Whole Hox cluster losses (black triangles) are also mapped highlighting the massive Hox cluster losses in the Pantodon lineage.
F<sc>ig</sc>. 6.
Fig. 6.
Types of homology relationships following WGD. Schematic outlining the types of homology relationships which can exist between gene duplicates as a result of WGD, taking into account the relative timing of diploidization. (A) The classical model where both WGD and full diploidization (DIP.) occur before speciation (SPE.) results in two types of homology relationships: Orthology (e.g., Sp1 GENE1b and Sp2 GENE1b) and paralogy (e.g., Sp1 GENE1a and Sp2 GENE1b). (B) The classical alternative scenario to the model presented in (A), where speciation occurs before two independent WGD and subsequent diploidization, only results in the formation of paralogs (e.g., Sp1 GENE1a and Sp2 GENE1x). (C) A new scenario where a single WGD occurs prior to speciation but where diploidization occurs independently in each lineage can result in a novel type of homology relationship: Tetralogy (e.g., Sp1 GENE 1a and Sp1 GENE 1b are tetralogous with Sp2 GENE 1x and Sp2 GENE 1y). Following duplication in a diploid (2n) ancestor, and the resultant tetraploid (4n) speciates prior to full diploidization, recombination will cease independently and loci will make the transition from 4n alleles to 2n paralogs separately in each lineage. Shared derived mutations between will then be able to accumulate independently between each pair of duplicates in each lineage, and no single duplicate in one lineage can be considered orthologous with a single duplicate in the other.

Similar articles

See all similar articles

Cited by 9 articles

See all "Cited by" articles


    1. Alfaro ME, Santini F, Brock C, Alamillo H, Dornburg A, Rabosky DL, Carnevale G, Harmon LJ. Nine exceptional radiations plus high turnover explain species diversity in jawed vertebrates. Proc Natl Acad Sci U S A. 2009;106:13410–13414. - PMC - PubMed
    1. Allendorf FW, Danzmann RG. Secondary tetrasomic segregation of MDH-B and preferential pairing of homeologues in rainbow trout. Genetics. 1997;145:1083–1092. - PMC - PubMed
    1. Allendorf FW, Thorgaard GH. Tetraploidy and the evolution of salmonid fishes. In: Turner BJ, editor. 1984. Evolutionary genetics of fishes. New York: Plenum Press. p. 1–53.
    1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic Local Alignment Search Tool. J Mol Biol. 1990;215:403–410. - PubMed
    1. Amores A, Force A, Yan Y-L, Joly L, Amimiya C, Fritz A, Ho RK, Langeland J, Prince V, Wang Y-L, et al. Zebrafish hox clusters and vertebrate genome evolution. Science. 1998;282:1711–1714. - PubMed

Publication types

Associated data