A Comprehensive Analysis of Transcript-Supported De Novo Genes in Saccharomyces sensu stricto Yeasts

Mol Biol Evol. 2017 Nov 1;34(11):2823-2838. doi: 10.1093/molbev/msx210.

Abstract

Novel genes arising from random DNA sequences (de novo genes) have been suggested to be widespread in the genomes of different organisms. However, our knowledge about the origin and evolution of de novo genes is still limited. To systematically understand the general features of de novo genes, we established a robust pipeline to analyze >20,000 transcript-supported coding sequences (CDSs) from the budding yeast Saccharomyces cerevisiae. Our analysis pipeline combined phylogeny, synteny, and sequence alignment information to identify possible orthologs across 20 Saccharomycetaceae yeasts and discovered 4,340 S. cerevisiae-specific de novo genes and 8,871 S. sensu stricto-specific de novo genes. We further combine information on CDS positions and transcript structures to show that >65% of de novo genes arose from transcript isoforms of ancient genes, especially in the upstream and internal regions of ancient genes. Fourteen identified de novo genes with high transcript levels were chosen to verify their protein expressions. Ten of them, including eight transcript isoform-associated CDSs, showed translation signals and five proteins exhibited specific cytosolic localizations. Our results suggest that de novo genes frequently arise in the S. sensu stricto complex and have the potential to be quickly integrated into ancient cellular network.

Keywords: S. sensu stricto yeast; de novo gene; novel gene; synteny analysis; transcript isoform; yeast evolution; yeast genomics.

MeSH terms

  • Base Sequence / genetics
  • Databases, Nucleic Acid
  • Evolution, Molecular
  • Genes, Fungal / genetics
  • Mutation Rate
  • Phylogeny
  • Saccharomyces / genetics
  • Saccharomyces cerevisiae / genetics*
  • Sequence Alignment / methods
  • Sequence Analysis, DNA / methods
  • Synteny / genetics