On the Origin of De Novo Genes in Arabidopsis thaliana Populations

Genome Biol Evol. 2016 Aug 3;8(7):2190-202. doi: 10.1093/gbe/evw164.

Abstract

De novo genes, which originate from ancestral nongenic sequences, are one of the most important sources of protein-coding genes. This origination process is crucial for the adaptation of organisms. However, how de novo genes arise and become fixed in a population or species remains largely unknown. Here, we identified 782 de novo genes from the model plant Arabidopsis thaliana and divided them into three types based on the availability of translational evidence, transcriptional evidence, and neither transcriptional nor translational evidence for their origin. Importantly, by integrating multiple types of omics data, including data from genomes, epigenomes, transcriptomes, and translatomes, we found that epigenetic modifications (DNA methylation and histone modification) play an important role in the origination process of de novo genes. Intriguingly, using the transcriptomes and methylomes from the same population of 84 accessions, we found that de novo genes that are transcribed in approximately half of the total accessions within the population are highly methylated, with lower levels of transcription than those transcribed at other frequencies within the population. We hypothesized that, during the origin of de novo gene alleles, those neutralized to low expression states via DNA methylation have relatively high probabilities of spreading and becoming fixed in a population. Our results highlight the process underlying the origin of de novo genes at the population level, as well as the importance of DNA methylation in this process.

Keywords: DNA methylation; de novo genes; origin; process; protein-coding genes.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Arabidopsis / genetics*
  • Arabidopsis Proteins / genetics*
  • Arabidopsis Proteins / metabolism
  • DNA Methylation
  • Epigenesis, Genetic
  • Evolution, Molecular*
  • Transcriptome

Substances

  • Arabidopsis Proteins