De Novo Genes Arise at a Slow but Steady Rate along the Primate Lineage and Have Been Subject to Incomplete Lineage Sorting

Genome Biol Evol. 2016 Apr 25;8(4):1222-32. doi: 10.1093/gbe/evw074.

Abstract

De novo protein-coding gene origination is increasingly recognized as an important evolutionary mechanism. However, there remains a large amount of uncertainty regarding the frequency of these events and the mechanisms and speed of gene establishment. Here, we describe a rigorous search for cases of de novo gene origination in the great apes. We analyzed annotated proteomes as well as full genomic DNA and transcriptional and translational evidence. It is notable that results vary between database updates due to the fluctuating annotation of these genes. Nonetheless we identified 35 de novo genes: 16 human-specific; 5 human and chimpanzee specific; and 14 that originated prior to the divergence of human, chimpanzee, and gorilla and are found in all three genomes. The taxonomically restricted distribution of these genes cannot be explained by loss in other lineages. Each gene is supported by an open reading frame-creating mutation that occurred within the primate lineage, and which is not polymorphic in any species. Similarly to previous studies we find that the de novo genes identified are short and frequently located near pre-existing genes. Also, they may be associated with Alu elements and prior transcription and RNA-splicing at the locus. Additionally, we report the first case of apparent independent lineage sorting of a de novo gene. The gene is present in human and gorilla, whereas chimpanzee has the ancestral noncoding sequence. This indicates a long period of polymorphism prior to fixation and thus supports a model where de novo genes may, at least initially, have a neutral effect on fitness.

Keywords: de novo genes; human; incomplete lineage sorting; new genes; primates.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Base Sequence
  • Evolution, Molecular*
  • Genetic Loci
  • Gorilla gorilla / genetics
  • Hominidae / genetics*
  • Humans
  • Macaca / genetics
  • Open Reading Frames
  • Pan troglodytes / genetics
  • Polymorphism, Genetic
  • Pongo / genetics
  • RNA, Untranslated / genetics

Substances

  • RNA, Untranslated