Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 18 (12), 1944-54

Unraveling Ancient Hexaploidy Through Multiply-Aligned Angiosperm Gene Maps

Affiliations

Unraveling Ancient Hexaploidy Through Multiply-Aligned Angiosperm Gene Maps

Haibao Tang et al. Genome Res.

Abstract

Large-scale (segmental or whole) genome duplication has been recurring in angiosperm evolution. Subsequent gene loss and rearrangements further affect gene copy numbers and fractionate ancestral gene linkages across multiple chromosomes. The fragmented "multiple-to-multiple" correspondences resulting from this distinguishing feature of angiosperm evolution complicates comparative genomic studies. Using a robust computational framework that combines information from multiple orthologous and duplicated regions to construct local syntenic networks, we show that a shared ancient hexaploidy event (or perhaps two roughly concurrent genome fusions) can be inferred based on the sequences from several divergent plant genomes. This "paleo-hexaploidy" clearly preceded the rosid-asterid split, but it remains equivocal whether it also affected monocots. The model resulting from our multi-alignments lays the foundation for approximating the number and arrangement of genes in the last universal common ancestor of angiosperms. Comparative analysis of inferred homologous genes derived from this model shows patterns of preferential gene retention or loss after polyploidy and reveals large variability of nucleotide substitution rates among plant nuclear genomes.

Figures

Figure 1.
Figure 1.
Flow-chart of MCscan core algorithm.
Figure 2.
Figure 2.
Collinearity between triplicate Vitis γ-homeologous regions with BAC sequences from Solanum (A) and Musa (B). (Black glyphs) Genes with the tip showing the transcriptional direction; (gray shades) synteny matches between a Vitis gene and Solanum or Musa sequences.
Figure 3.
Figure 3.
Topologies for five proximal γ ancestral loci that contain three collinear Vitis genes. Vitis gene names are abbreviated as “[chromosome].[gene index]” for graphing. Each tree was rooted using one best-matching moss gene, identified by JGI protein accession number. The numbers above branches are bootstrap values in the phylogenetic reconstruction. There are a total of 10 local blocks that have more than five triplets in Carica and Vitis that are studied in the same way. Phylogenetic analysis was performed using PHYLIP version 3.67 (Retief 2000). The analysis was carried out using the protdist program (default parameters) followed by neighbor-joining using neighbor. We used the seqboot program to simulate 100 bootstrap replicates and the consense program to retrieve one consensus tree.
Figure 4.
Figure 4.
(A,B) Distribution of Ks distances among Carica, Populus, Vitis, and Arabidopsis paleologs. Ks values are grouped into bins of 0.1 intervals. Certain Ks intervals are highlighted as they correspond to several presumed whole-genome duplication events. Dotted lines are fitted mixtures of log-normal distributions for the paleolog Ks distributions (see Methods). (C) Distribution of 4DTV distance among paleologs in the same four eudicot lineages. (D) Phylogeny of single-copy ortholog set used in relative rate estimates. A total of 47 orthologous genes that are single copy in all five species were used in the analysis. Protein alignments for each ortholog group were constructed and then used to guide DNA alignments. The alignments are then concatenated, with 53,856 aligned nucleotide positions. Per-site Ks values on each branch were estimated by codeml in the PAML package (Yang 1997) using a constrained topology that reflects organismal relationships.
Figure 5.
Figure 5.
Phylogenetic analysis of ancestral loci N01482 (A) and N01483 (B). Coding sequences of all members in four eudicot species for each ancestral locus (19 genes in N01482, 21 in N01483) were aligned by CLUSTALW (Thompson et al. 1994) using parameters suggested by Hall (2007). Phylogenetic relationships among the members and sequences were grouped into clades using MrBayes (Ronquist and Huelsenbeck 2003). The Bayesian analysis was carried out for 500,000 generations using the General Time Reversible plus Gamma (GTR+G) substitution model selected based on MODELTEST (Posada and Crandall 1998). All branches with support <50% are collapsed into a polytomy. A majority tree was presented in both cases. The gene names for Carica, Populus, and Vitis are recoded to reflect relative orders on chromosome or scaffold (see Methods). The conversions from the original locus identifiers to the re-indexed gene names are available as a conversion table in Supplemental Data 4. In case the original gene identifiers are subject to future changes, the conversion table will be updated accordingly. Arabidopsis gene names follow their standard TAIR locus IDs. Scale bars represent the number of substitutions per site following the GTR+G model.

Similar articles

See all similar articles

Cited by 198 articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback