Hominoid-specific de novo protein-coding genes originating from long non-coding RNAs

PLoS Genet. 2012 Sep;8(9):e1002942. doi: 10.1371/journal.pgen.1002942. Epub 2012 Sep 13.


Tinkering with pre-existing genes has long been known as a major way to create new genes. Recently, however, motherless protein-coding genes have been found to have emerged de novo from ancestral non-coding DNAs. How these genes originated is not well addressed to date. Here we identified 24 hominoid-specific de novo protein-coding genes with precise origination timing in vertebrate phylogeny. Strand-specific RNA-Seq analyses were performed in five rhesus macaque tissues (liver, prefrontal cortex, skeletal muscle, adipose, and testis), which were then integrated with public transcriptome data from human, chimpanzee, and rhesus macaque. On the basis of comparing the RNA expression profiles in the three species, we found that most of the hominoid-specific de novo protein-coding genes encoded polyadenylated non-coding RNAs in rhesus macaque or chimpanzee with a similar transcript structure and correlated tissue expression profile. According to the rule of parsimony, the majority of these hominoid-specific de novo protein-coding genes appear to have acquired a regulated transcript structure and expression profile before acquiring coding potential. Interestingly, although the expression profile was largely correlated, the coding genes in human often showed higher transcriptional abundance than their non-coding counterparts in rhesus macaque. The major findings we report in this manuscript are robust and insensitive to the parameters used in the identification and analysis of de novo genes. Our results suggest that at least a portion of long non-coding RNAs, especially those with active and regulated transcription, may serve as a birth pool for protein-coding genes, which are then further optimized at the transcriptional level.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Evolution, Molecular*
  • Hominidae / genetics*
  • Humans
  • Macaca mulatta / genetics
  • Open Reading Frames / genetics*
  • Pan troglodytes / genetics
  • Phylogeny
  • RNA, Long Noncoding / genetics*
  • Species Specificity
  • Tissue Distribution / genetics
  • Transcriptome


  • RNA, Long Noncoding

Grants and funding

This work was supported by National Basic Research Program of China [2011CB518000] (http://www.973.gov.cn/), the National Natural Science Foundation of China [31171269] (http://www.nsfc.gov.cn/Portal0/default106.htm), and National High-Tech R&D Program [2007AA02Z165] (http://www.most.gov.cn/eng/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.