Orphan genes that lack detectable homologues in other lineages could contribute to a variety of biological functions. However, their origination and function mechanisms remain largely unknown. Herein, through a comprehensive and systematic computational pipeline, we identified 893 orphan genes in the lineage of C. elegans, of which only a low fraction (0.9%) were derived from transposon elements. Six new protein-coding genes that de novo originated from non-coding DNA sequences in the genome of C. elegans were also identified. The authenticity and functionality of these orphan genes and de novo genes are supported by three lines of evidences, consisting of transcriptional data, and in silico proteomic data, and the fixation status data in wild populations. Orphan genes and de novo genes exhibited simple gene structures, such as, short in protein length, of fewer exons, and are frequently X-linked. RNA-seq data analysis showed these orphan genes are enriched with expression in embryo development and gonad, and their potential function in early development was further supported by gene ontology enrichment analysis results. Meanwhile, de novo genes were found to be with significant expression in gonad, and functional enrichment analysis of the co-expression genes of these de novo genes suggested they may be functionally involved in signaling transduction pathway and metabolism process. Our results presented the first systematic evidence on the evolution of orphan genes and de novo origin of genes in nematodes and their impacts on the functional and phenotypic evolution, and thus could shed new light on our appreciation of the importance of these new genes.
Keywords: Caenorhabditis elegans; de novo genes; orphan genes.