How the enormous structural and functional diversity of new genes and proteins was generated (estimated to be 10(10)-10(12) different proteins in all organisms on earth [Choi I-G, Kim S-H. 2006. Evolution of protein structural classes and protein sequence families. Proc Natl Acad Sci 103: 14056-14061] is a central biological question that has a long and rich history. Extensive work during the last 80 years have shown that new genes that play important roles in lineage-specific phenotypes and adaptation can originate through a multitude of different mechanisms, including duplication, lateral gene transfer, gene fusion/fission, and de novo origination. In this review, we focus on two main processes as generators of new functions: evolution of new genes by duplication and divergence of pre-existing genes and de novo gene origination in which a whole protein-coding gene evolves from a noncoding sequence.
Copyright © 2015 Cold Spring Harbor Laboratory Press; all rights reserved.
Mechanisms of new gene acquisition. (
A) Horizontal gene transfer. The foreign gene (yellow) is transferred from another organism and integrated into the genome by recombination. ( B) De novo origination. Mutations in a previously nonfunctional sequence create a new gene (yellow). ( C) Duplication–divergence. A duplicate of an ancestral gene (green) acquires a new function and becomes a new gene (yellow).
Formation and fates of duplications and amplifications. (
A) Unequal crossover between two direct repeats (green rectangles) on sister chromatids results in a duplication of the intervening sequence. Unequal exchange between the two copies results in either loss or further amplification. ( B) Amplification through rolling circle replication. A double-strand break leads to single strand invasion of a homologous sequence (green rectangle) on the same chromosome. Replication from the site of invasion leads to rolling circle amplification of the sequence between the repeats. Homologous recombination with another chromosome completes the amplification. The thickness of the arrows reflects the rates of duplication ( k dupl), amplification ( k ampl), and segregation ( k loss).
The innovation–amplification–divergence (IAD) model. An ancestral protein possesses a promiscuous side-activity “b” in addition to its main activity “A.” An environmental change makes the b activity beneficial (Innovation). Selection to increase expression of the b activity leads to enrichment of duplications and amplifications (Amplification). Mutant variants (yellow) with improved b activity “B” are selectively amplified, whereas less-improved variants may be lost. When a mutant variant with sufficient B activity appears, selection to maintain the amplification is relaxed, and the amplification segregates. If selection to keep the original A activity is present throughout the process, the end result is a duplication in which one copy has the new B function and the other has the original A function. Positive selection is the driving force for the entire process.
Antifreeze proteins (AFPs) evolution in Antarctic eelpout. (
A) AFPs in Antarctic eelpout have evolved from the carboxy terminal of sialic acid synthase (SAS). Closely related fish without AFPs have two SAS genes next to each other in the genome. In Antarctic eelpout, a transposon (LdCR1-3) has been inserted in between the SAS genes. ( B) In another chromosomal locus in Antarctic eelpout, there is a truncated copy of the transposon (LdCR1-3) along with >30 tandem copies of the newly evolved AFP. This transposition event is not found in closely related fish not adapted to cold conditions. The AFPs derive from the carboxyl terminus of the SAS-B gene that shows a weak intrinsic ice-binding activity. The amplified copies have accumulated point mutations that contribute to ice binding and a secretion signal in the amino terminal to direct extracellular secretion. L. dearborni, Lycodichthys dearborni; G. aculeatus; Gasterosteus aculeatus.
An experimental test of the innovation–amplification–divergence (IAD) model.
Salmonella enterica carrying a bifunctional gene, hisA , with two weak activities in histidine biosynthesis (original activity) and tryptophan biosynthesis (new activity) were placed under selection for 3000 generations to improve both activities. After 1000 and 2000 generations, some chosen lineages were split up into new lineages as indicated in the white text boxes. ( dual A) Trajectories of evolution. Green symbols indicate gene variants that are sufficient for growth in the absence of both histidine and tryptophan (generalists). Blue symbols indicate variants that are sufficient to support growth in the absence of histidine but not tryptophan (HisA specialists). Yellow symbols indicate variants that are sufficient for supporting growth in the absence of tryptophan but not histidine (TrpF specialists). Numbers to the left indicate after how many generations the indicated variants were observed. The black and white bars on the gene symbols indicate mutations in the evolved variants compared with the ancestral hisA gene. ( dual B) Trajectories of evolution of enzyme activities. The circled letters indicate examples of evolved gene variants (highlighted with the same letters in A). Each gene variant was placed as a single copy on the Salmonella chromosome. The growth rate in the presence of histidine but absence of tryptophan was used as a measure of TrpF activity on the y-axis. The growth rate in the presence of tryptophan but absence of histidine was used as a measure of HisA activity on the x-axis. gen., Generations.
De novo gene evolution. Genes can evolve de novo through several mechanisms. Transcription of protogenes (noncoding RNAs with open reading frames (ORFs), overlapping gene ORFs, intergenic regions) lead to ribosomal association and translation of the message. Translated peptides might confer a selective advantage through potential weak promiscuous activities. The mechanism described in the innovation–amplification–divergence (IAD) model would operate to increase the effectiveness of the protogenes through positive selection and promote the birth of novel genes. Some protogenes never make the transition to become selectively advantageous in the long run and become pseudogenized.
De novo origination of a new protein-coding gene in Saccharomyces cerevisiae.
Genetics. 2008 May;179(1):487-96. doi: 10.1534/genetics.107.084491.
18493065 Free PMC article.
De Novo Emergence of Peptides That Confer Antibiotic Resistance.
mBio. 2019 Jun 4;10(3):e00837-19. doi: 10.1128/mBio.00837-19.
31164464 Free PMC article.
Evaluating Phylostratigraphic Evidence for Widespread De Novo Gene Birth in Genome Evolution.
Mol Biol Evol. 2016 May;33(5):1245-56. doi: 10.1093/molbev/msw008. Epub 2016 Jan 11.
Mol Biol Evol. 2016.
26758516 Free PMC article.
New gene evolution: little did we know.
Annu Rev Genet. 2013;47:307-33. doi: 10.1146/annurev-genet-111212-133301. Epub 2013 Sep 13.
Annu Rev Genet. 2013.
24050177 Free PMC article.
Orphans and new gene origination, a structural and evolutionary perspective.
Curr Opin Struct Biol. 2014 Jun;26:73-83. doi: 10.1016/j.sbi.2014.05.006. Epub 2014 Jun 13.
Curr Opin Struct Biol. 2014.
Identification and Analysis of Long Repeats of Proteins at the Domain Level.
Front Bioeng Biotechnol. 2019 Oct 8;7:250. doi: 10.3389/fbioe.2019.00250. eCollection 2019.
Front Bioeng Biotechnol. 2019.
31649924 Free PMC article.
Molecular and cellular evolution of corticogenesis in amniotes.
Cell Mol Life Sci. 2020 Apr;77(8):1435-1460. doi: 10.1007/s00018-019-03315-x. Epub 2019 Sep 28.
Cell Mol Life Sci. 2020.
Computational classification of MocR transcriptional regulators into subgroups as a support for experimental and functional characterization.
Bioinformation. 2019 Feb 28;15(2):151-159. doi: 10.6026/97320630015151. eCollection 2019.
31435161 Free PMC article.
Amyloidosis and Longevity: A Lesson from Plants.
Biology (Basel). 2019 May 24;8(2):43. doi: 10.3390/biology8020043.
Biology (Basel). 2019.
31137746 Free PMC article.
Research Support, Non-U.S. Gov't