Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 19 (10), 1752-9

Recent De Novo Origin of Human Protein-Coding Genes

Affiliations

Recent De Novo Origin of Human Protein-Coding Genes

David G Knowles et al. Genome Res.

Abstract

The origin of new genes is extremely important to evolutionary innovation. Most new genes arise from existing genes through duplication or recombination. The origin of new genes from noncoding DNA is extremely rare, and very few eukaryotic examples are known. We present evidence for the de novo origin of at least three human protein-coding genes since the divergence with chimp. Each of these genes has no protein-coding homologs in any other genome, but is supported by evidence from expression and, importantly, proteomics data. The absence of these genes in chimp and macaque cannot be explained by sequencing gaps or annotation error. High-quality sequence data indicate that these loci are noncoding DNA in other primates. Furthermore, chimp, gorilla, gibbon, and macaque share the same disabling sequence difference, supporting the inference that the ancestral sequence was noncoding over the alternative possibility of parallel gene inactivation in multiple primate lineages. The genes are not well characterized, but interestingly, one of them was first identified as an up-regulated gene in chronic lymphocytic leukemia. This is the first evidence for entirely novel human-specific protein-coding genes originating from ancestrally noncoding sequences. We estimate that 0.075% of human genes may have originated through this mechanism leading to a total expectation of 18 such cases in a genome of 24,000 protein-coding genes.

Figures

Figure 1.
Figure 1.
Schematic of analysis pipeline. The expected location of genes with no BLASTP hit was scrutinized for any evidence of a homologous protein-coding gene. The expected location of a gene is indicated by green shading and was defined as a 10 gene window on either side of the gene of interest projected onto the syntenic location in the other genome. Candidate genes were excluded if there was a sequencing gap in the expected location (or local inversions that rendered the expected location ambiguous) or similar sequence at the expected location with no clear exclusion from producing a protein.
Figure 2.
Figure 2.
Sequence changes in the origin of CLLU1 from noncoding DNA. (A) Region of conserved synteny between human and chimp chromosomes 12. Genes are indicated by rectangular boxes and the region of chromosome is indicated by a horizontal line. Unambiguous 1:1 orthologs that were used to infer the synteny block are shown in red. One gene in this region, chronic lymphocytic leukemia up-regulated gene 1 (CLLU1), had no BLASTP hits in any other genome and is shown in green. (B) Multiple sequence alignment of the gene sequence of the human gene CLLU1 and similar nucleotide sequences from the syntenic location in chimp and macaque. The start codon is located immediately following the first alignment gap, which was inserted for clarity. Stop codons are indicated by red boxes. The sequenced peptide identified from this locus is indicated in orange. The critical mutation that allows the production of a protein is the deletion of an A nucleotide, which is present in both chimp and macaque (indicated by an arrow). This causes a frameshift in human that results in a much longer ORF capable of producing a 121-amino acids-long protein. Both the chimp and macaque sequences have a stop codon after only 42 potential codons. (C) Alignment of the region around the critical human enabler-mutation with similar nucleotide sequences from the syntenic regions in chimp, and macaque and sequence traces from gorilla, gibbon, and orangutan. For gorilla, gibbon, and orangutan the trace database accession number is shown on the right. The disabler is also shared by gorilla and gibbon indicating it is ancestral.
Figure 3.
Figure 3.
Sequence changes in the origin of C22orf45 from noncoding DNA. As in Figure 2: (A) Region of conserved synteny between human and chimp chromosomes 22. One gene in this region, C22orf45, had no BLASTP hits in any other genome and is shown in green. (B) Multiple sequence alignment of the gene sequence of C22orf45 and similar nucleotide sequences from the syntenic location in chimp and macaque. The arrow indicates the location of an in-frame stop codon shared by chimp and macaque that would result in premature termination (red box) irrespective of the other disablements. The codons highlighted with a yellow box indicate the stop codon including all disablements (indels) in chimp and macaque for the reading frame starting from the same location as the human start (note the ATG start codon is absent in macaque and that the frameshifts mean the hypothetical protein sequence is drastically altered). (C) The disabler is also shared by gorilla, orangutan, and gibbon indicating it is ancestral.
Figure 4.
Figure 4.
Sequence changes in the origin of DNAH10OS from noncoding DNA. As in Figures 2 and 3: (A) region of conserved synteny between human and chimp chromosomes 12. One gene in this region, DNAH10OS, had no BLASTP hits in any other genome and is shown in green. (B) Multiple sequence alignment of the gene sequence of DNAH10OS and similar nucleotide sequences from the syntenic location in chimp and macaque. If the ORF began at the same position as the human start codon (note the start codon is present in chimp but absent in macaque), the macaque hypothetical protein sequence would be very different from the human protein due to frameshifts and would terminate at the stop codon indicated in yellow. The arrow indicates the location of a 10-bp indel shared by chimp and macaque that would result in premature termination irrespective of the other disablements. (C) The disabler is also shared by gorilla, orangutan, and gibbon indicating that this is a human-specific 10-bp insertion.

Comment in

Similar articles

See all similar articles

Cited by 116 articles

See all "Cited by" articles

Publication types

Associated data

LinkOut - more resources

Feedback