Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2003 Oct;13(10):2213-9.
doi: 10.1101/gr.1311003.

An Evolutionary Analysis of Orphan Genes in Drosophila

Affiliations
Free PMC article

An Evolutionary Analysis of Orphan Genes in Drosophila

Tomislav Domazet-Loso et al. Genome Res. .
Free PMC article

Abstract

Orphan genes are protein-coding regions that have no recognizable homolog in distantly related species. A substantial fraction of coding regions in any genome sequenced consists of orphan genes, but the evolutionary and functional significance of orphan genes is not understood. We present a reanalysis of the Drosophila melanogaster proteome that shows that there are still between 26% and 29% of all proteins without a significant match with noninsect sequences, and that these orphans are underrepresented in genetic screens. To analyze the characteristics of orphan genes in Drosophila, we used sequence comparisons between cDNAs retrieved from two Drosophila yakuba libraries and their corresponding D. melanogaster orthologs. We find that a cDNA library from adults yields twice as many orphan genes as such a library from embryos. The orphan genes evolve on average more than three times faster than nonorphan genes, although the width of the evolutionary rate distribution is similar for the two classes. In particular, some orphan genes show very low substitution rates that are comparable to otherwise highly conserved genes. We propose a model suggesting that orphans may be involved in the evolution of adaptive traits, and that slow-evolving orphan genes may be particularly interesting candidate genes for identifying lineage-specific adaptations.

Figures

Figure 1
Figure 1
(A) Percentage of orphans found in each cutoff category. The broken lines indicate the BLAST E-value range of 10-3 to 10-5, for which we find 26%-29% orphan genes and the highest odds ratio (see below). (B) Odds ratios for genetically studied genes in the different cutoff classes. The values indicate how much more likely one finds a genetically studied gene in the nonorphan class for a given cutoff. All values are highly significant (P « 0.001 Fischer's exact test).
Figure 2
Figure 2
Scatterplot of the nucleotide substitution rates at synonymous (dS) and nonsynonymous (dN) sites for the embryo library (top) and adult library (bottom). (•) orphan genes; (○) nonorphan genes. The mean of the dN values for the orphan genes is marked as a solid line and for nonorphan as a dashed line. Genes for which the null hypothesis that dN and dS are equal cannot be rejected are marked with a star.
Figure 3
Figure 3
Discrete distribution of dN/dS ratios for the embryo (top) and the adult (bottom) library. The percentage of genes falling into the respective dN/dS value classes are represented by black (orphans) and gray (nonorphans) columns. Similar distribution patterns are obtained for dN alone (data not shown). Note the logarithmic scale for representing the dN/dS ratio classes.
Figure 4
Figure 4
Model for the evolution of orphan genes. The model assumes an initial gene duplication, after which selective constraints in one of the duplicated genes become relaxed. This allows a fast evolutionary divergence (left), indicated by a long branch in the topology. After a lineage splitting event, the gene may become integrated into a new central function in one lineage, but not in the other, where it continues to evolve quickly because of reduced constraints. The new function in the first lineage implies that the gene would go through a phase of adaptive evolution, which would also result in a long branch, depending on how many amino acid changes occurred during the phase of adaptation. But once an adaptive peak is reached, further evolution is slowed down and the branches become short. At this time, the gene may have lost all sequence similarity to its parent gene, but not necessarily its structural similarity. The parent gene (right topology) would undergo the same lineage splitting events, but would continue to have short branches in all lineages, because it has retained its original function. This model suggests the existence of three types of divergence modes: (1) fast divergence of genes which may or may not yet have lost their sequence similarity to their parent gene, (2) fast divergence due to positive selection, and (3) slow-evolving orphan genes. Note that the model would apply in a similar way if the initial gene would not have been created through a pure gene duplication, but through recruitment and recombination of exons from other genes, or even after a gene has lost its original function in the context of a speciation event.

Similar articles

See all similar articles

Cited by 102 articles

See all "Cited by" articles

Publication types

Substances

Associated data