Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Apr;178(4):2429-32.
doi: 10.1534/genetics.107.086405.

Nearly neutrality and the evolution of codon usage bias in eukaryotic genomes

Affiliations

Nearly neutrality and the evolution of codon usage bias in eukaryotic genomes

Sankar Subramanian. Genetics. 2008 Apr.

Abstract

Here I show that the mean codon usage bias of a genome, and of the lowly expressed genes in a genome, is largely similar across eukaryotes ranging from unicellular protists to vertebrates. Conversely, this bias in housekeeping genes and in highly expressed genes has a remarkable inverse relationship with species generation time that varies by more than four orders of magnitude. The relevance of these results to the nearly neutral theory of molecular evolution is discussed.

PubMed Disclaimer

Figures

F<sc>igure</sc> 1.—
Figure 1.—
Relationship between codon usage bias and generation time of eukaryotes. The protein-coding sequences of complete or nearly complete genomes of 20 eukaryotic species from various public data banks were obtained. The gene expression data in the form of expressed sequence tags (ESTs) were obtained from dbEST (http://www.ncbi.nlm.nih.gov) and using BLASTN the ESTs were matched to the respective genes using the method described before (Duret and Mouchiroud 2000). The species data set was chosen on the basis of the availability of a large number of genes as well as their corresponding gene expression data. Also the species were chosen to represent the major groups of eukaryotes and to get a wide distribution of generation times. Furthermore the choice of EST instead of microarray data (or other expression data) was purely based on its availability for all the species used in this study. To estimate the codon usage bias, the method ENC′ (Novembre 2002) was employed using the software ENC prime (http://home.uchicago.edu/∼jnovembre/software/software.html). Although a recent report pointed out a drawback of the ENC′ method, this does not affect when the codon bias estimates are used in a relative manner such as in correlation (Fuglsang 2006). The numbers of genes in the genomes, translational genes, lowly expressed genes (with 1EST), highly expressed genes (top 1%), and generation time (days) of the species used are as follows: Anopheles gambiae (4877, 50, 804, 39, 10); Apis mellifera (7854, 124, 911, 59, 40); Arabidopsis thaliana (26,536, 69, 2405, 70, 45); Bos taurus (18,895, 185, 1784, 48, 730); Caenorhabditis elegans (20,043, 136, 1674, 64, 3); Canis familiaris (19,599, 191, 2961, 77, 330); Danio rerio (23,482, 126, 2549, 41, 90); Dictyostelium discoideum (13,147, 102, 819, 29, 0.3); Drosophila melanogaster (13,982, 114, 1444, 63, 12); Entamoeba histolytica (9531, 128, 471, 11, 0.42); Gallus gallus (9518, 48, 2272, 57, 150); Homo sapiens (28,015, 226, 4515, 111, 7300); Mus musculus (30,079, 219, 3406, 105, 65); Oryza sativa (23,311, 276, 3980, 121, 135); Saccharomyces cerevisia (6687, 258, 1219, 27, 0.1); Strongylocentrotus purpuratus (17,472, 78, 1915, 63, 365); Tetrahymena thermophila (27,355, 120, 3655, 98, 0.13); Tribolium castaneum (9221, 49, 1456, 36, 70); Trypanosoma cruzi (15,546, 145, 2521, 41, 1); and Xenopus tropicalis (5477, 47, 371, 51, 120). The sources of generation-time information are given in supplemental Table 1. (A) The correlation of the codon usage bias (ENC′) of all genes of the genomes (open circles) and that of the genes involved in translation (predominantly consist of ribosomal genes, tRNA syntetases, initiation and elongation factors) (solid circles) with generation time. x-axis is shown in log scale. Spearman's coefficient for the genome, ρ = −0.15, P = 0.52 and for translational genes, ρ = 0.77, P = 0.0008. (B) The relationship of the ENC′ estimated for the genes with low (open circles) and high (solid circles) expression levels (excluding the translational genes) with generation time. Spearman's coefficient for the lowly expressed genes, ρ = −0.08, P = 0.74 and for the highly expressed genes, ρ = 0.74, P = 0.0014. (C) The log–log relationship between ΔENC′ and species generation time. Here ΔENC′ = (ENC′L − ENC′TH)/ENC′L, where ENC′TH is the average codon usage bias of translational + highly expressed genes and ENC′L is that of low-expressed genes. Spearman's coefficient for all species ρ = −0.87, P = 0.0002 and for the vertebrate subset ρ = −0.89, P = 0.029. The best-fitting linear regression lines are shown.

Similar articles

Cited by

References

    1. Akashi, H., 1995. Inferring weak selection from patterns of polymorphism and divergence at “silent” sites in Drosophila DNA. Genetics 139 1067–1076. - PMC - PubMed
    1. Akashi, H., 1997. Codon bias evolution in Drosophila. Population genetics of mutation-selection drift. Gene 205 269–278. - PubMed
    1. Chao, L., and D. E. Carr, 1993. The molecular clock and the relationship between population-size and generation time. Evolution 47 688–690. - PubMed
    1. Dong, H., L. Nilsson and C. G. Kurland, 1996. Co-variation of tRNA abundance and codon usage in Escherichia coli at different growth rates. J. Mol. Biol. 260 649–663. - PubMed
    1. Duret, L., and D. Mouchiroud, 2000. Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate. Mol. Biol. Evol. 17 68–74. - PubMed

Publication types

LinkOut - more resources