Independent sorting-out of thousands of duplicated gene pairs in two yeast species descended from a whole-genome duplication

Proc Natl Acad Sci U S A. 2007 May 15;104(20):8397-402. doi: 10.1073/pnas.0608218104. Epub 2007 May 9.


Among yeasts that underwent whole-genome duplication (WGD), Kluyveromyces polysporus represents the lineage most distant from Saccharomyces cerevisiae. By sequencing the K. polysporus genome and comparing it with the S. cerevisiae genome using a likelihood model of gene loss, we show that these species diverged very soon after the WGD, when their common ancestor contained >9,000 genes. The two genomes subsequently converged onto similar current sizes (5,600 protein-coding genes each) and independently retained sets of duplicated genes that are strikingly similar. Almost half of their surviving single-copy genes are not orthologs but paralogs formed by WGD, as would be expected if most gene pairs were resolved independently. In addition, by comparing the pattern of gene loss among K. polysporus, S. cerevisiae, and three other yeasts that diverged after the WGD, we show that the patterns of gene loss changed over time. Initially, both members of a duplicate pair were equally likely to be lost, but loss of the same gene copy in independent lineages was increasingly favored at later time points. This trend parallels an increasing restriction of reciprocal gene loss to more slowly evolving gene pairs over time and suggests that, as duplicate genes diverged, one gene copy became favored over the other. The apparent low initial sequence divergence of the gene pairs leads us to propose that the yeast WGD was probably an autopolyploidization.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence
  • Conserved Sequence
  • Evolution, Molecular*
  • Gene Duplication*
  • Gene Order
  • Genes, Duplicate*
  • Genome, Fungal / genetics*
  • Kluyveromyces / genetics*
  • Likelihood Functions
  • Models, Genetic
  • Molecular Sequence Data
  • Saccharomyces cerevisiae / genetics*
  • Sequence Homology, Nucleic Acid
  • Time Factors

Associated data

  • GENBANK/AAZN00000000