Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Dec;168(4):2245-60.
doi: 10.1534/genetics.104.030866.

Intragenic spatial patterns of codon usage bias in prokaryotic and eukaryotic genomes

Affiliations

Intragenic spatial patterns of codon usage bias in prokaryotic and eukaryotic genomes

Hong Qin et al. Genetics. 2004 Dec.

Abstract

To study the roles of translational accuracy, translational efficiency, and the Hill-Robertson effect in codon usage bias, we studied the intragenic spatial distribution of synonymous codon usage bias in four prokaryotic (Escherichia coli, Bacillus subtilis, Sulfolobus tokodaii, and Thermotoga maritima) and two eukaryotic (Saccharomyces cerevisiae and Drosophila melanogaster) genomes. We generated supersequences at each codon position across genes in a genome and computed the overall bias at each codon position. By quantitatively evaluating the trend of spatial patterns using isotonic regression, we show that in yeast and prokaryotic genomes, codon usage bias increases along translational direction, which is consistent with purifying selection against nonsense errors. Fruit fly genes show a nearly symmetric M-shaped spatial pattern of codon usage bias, with less bias in the middle and both ends. The low codon usage bias in the middle region is best explained by interference (the Hill-Robertson effect) between selections at different codon positions. In both yeast and fruit fly, spatial patterns of codon usage bias are characteristically different from patterns of GC-content variations. Effect of expression level on the strength of codon usage bias is more conspicuous than its effect on the shape of the spatial distribution.

PubMed Disclaimer

Figures

F<sc>igure</sc> 1.—
Figure 1.—
Supersequences at each codon position across genes. (A) First half in translational direction (forward); (B) second half opposite to translational direction (backward). Each bar () represents a codon position. Vertical arrows represent supersequences.
F<sc>igure</sc> 2.—
Figure 2.—
Intragenic spatial codon usage bias in prokaryotic genomes, Escherichia coli K12 (A and B), Bacillus subtilis (C and D), Thermotoga maritima (E and F), and Sulfolobus tokodaii (G and H). Asterisks in the insets indicate trends are significant by isotonic regression (see Table 1). c is plotted in the vertically inverted direction because a large value represents weak bias. Both the start and stop codons are excluded in the plots. For visual presentation, plots were smoothed by a locally weighted regression in a sliding window of 10 codons; however, the isotonic regression was applied to the original data. Genes were grouped into intervals with approximately equal numbers of genes by their length measured in the number of codons. The average length of each interval is given in the insets. The plot of each length interval is color coded.
F<sc>igure</sc> 2.—
Figure 2.—
Intragenic spatial codon usage bias in prokaryotic genomes, Escherichia coli K12 (A and B), Bacillus subtilis (C and D), Thermotoga maritima (E and F), and Sulfolobus tokodaii (G and H). Asterisks in the insets indicate trends are significant by isotonic regression (see Table 1). c is plotted in the vertically inverted direction because a large value represents weak bias. Both the start and stop codons are excluded in the plots. For visual presentation, plots were smoothed by a locally weighted regression in a sliding window of 10 codons; however, the isotonic regression was applied to the original data. Genes were grouped into intervals with approximately equal numbers of genes by their length measured in the number of codons. The average length of each interval is given in the insets. The plot of each length interval is color coded.
F<sc>igure</sc> 2.—
Figure 2.—
Intragenic spatial codon usage bias in prokaryotic genomes, Escherichia coli K12 (A and B), Bacillus subtilis (C and D), Thermotoga maritima (E and F), and Sulfolobus tokodaii (G and H). Asterisks in the insets indicate trends are significant by isotonic regression (see Table 1). c is plotted in the vertically inverted direction because a large value represents weak bias. Both the start and stop codons are excluded in the plots. For visual presentation, plots were smoothed by a locally weighted regression in a sliding window of 10 codons; however, the isotonic regression was applied to the original data. Genes were grouped into intervals with approximately equal numbers of genes by their length measured in the number of codons. The average length of each interval is given in the insets. The plot of each length interval is color coded.
F<sc>igure</sc> 2.—
Figure 2.—
Intragenic spatial codon usage bias in prokaryotic genomes, Escherichia coli K12 (A and B), Bacillus subtilis (C and D), Thermotoga maritima (E and F), and Sulfolobus tokodaii (G and H). Asterisks in the insets indicate trends are significant by isotonic regression (see Table 1). c is plotted in the vertically inverted direction because a large value represents weak bias. Both the start and stop codons are excluded in the plots. For visual presentation, plots were smoothed by a locally weighted regression in a sliding window of 10 codons; however, the isotonic regression was applied to the original data. Genes were grouped into intervals with approximately equal numbers of genes by their length measured in the number of codons. The average length of each interval is given in the insets. The plot of each length interval is color coded.
F<sc>igure</sc> 3.—
Figure 3.—
Trend of spatial codon usage bias as shown by the expected c values from isotonic regression. Both the start and stop codons are excluded in the analysis. Side-by-side plots of expected c+ vs. codon positions for the first halves and second halves are presented along the translational direction. Lc is the gene length measured in the number of codons. Λ is the test statistic of isotonic regression. Two examples are presented, the Lc = 420 interval in the E. coli genome (A and B) and the Lc = 400 interval in the S. cerevisiae genome (C and D). In E. coli, the isotonic regression is applied to codon positions beyond the first 25 and before the last 25 codons. In S. cerevisiae, regression is applied to codon positions beyond the first 20 and before the last 20 codons.
F<sc>igure</sc> 4.—
Figure 4.—
Incremental pattern of intragenic spatial codon usage bias in S. cerevisiae (A and B) and the effect of expression on the pattern (C and D). The c value is plotted in the inverted direction. Both the start and stop codons are excluded in the plots. Plots are smoothed by a locally weighted regression in a sliding window of 10 codons (the isotonic regression was applied to the unsmoothed data). Asterisks in the inset indicate significant trends by isotonic regression (see Tables 1 and 2). The asterisk in parentheses indicates a significant trend opposite to other groups. In C and D, genes are grouped into top 20%, middle 20%, and bottom 20% by their expression levels measured by Affymetrix DNA microarrays. Genes are then divided into three equal intervals by gene length (Lc). Each group is labeled by its expression level and average gene length. The bottom 20% in yeast shows a flat pattern and is omitted for clarity.
F<sc>igure</sc> 4.—
Figure 4.—
Incremental pattern of intragenic spatial codon usage bias in S. cerevisiae (A and B) and the effect of expression on the pattern (C and D). The c value is plotted in the inverted direction. Both the start and stop codons are excluded in the plots. Plots are smoothed by a locally weighted regression in a sliding window of 10 codons (the isotonic regression was applied to the unsmoothed data). Asterisks in the inset indicate significant trends by isotonic regression (see Tables 1 and 2). The asterisk in parentheses indicates a significant trend opposite to other groups. In C and D, genes are grouped into top 20%, middle 20%, and bottom 20% by their expression levels measured by Affymetrix DNA microarrays. Genes are then divided into three equal intervals by gene length (Lc). Each group is labeled by its expression level and average gene length. The bottom 20% in yeast shows a flat pattern and is omitted for clarity.
F<sc>igure</sc> 5.—
Figure 5.—
Intragenic spatial pattern of GC content in S. cerevisiae and the effect of expression on the pattern. Both the start and stop codons are excluded in the plots. Plots are smoothed by a locally weighted regression in a sliding window of 10 codons. Asterisks in the insets indicate significant trends by isotonic regression (see Table 3). Genes are grouped in the same way as in Figure 4, C and D.
F<sc>igure</sc> 6.—
Figure 6.—
The M-shaped pattern of the intragenic spatial codon usage bias in D. melanogaster (A and B) and the effect of expression level on the pattern (C and D). The c value is plotted in the inverted direction. Asterisks in A and B indicate significant trends in the middle section by isotonic regression (see Table 1). Plots are smoothed by a locally weighted regression in a sliding window of 10 codons (the isotonic regression was applied to the unsmoothed data). Genes are grouped by gene length. Each group contains similar numbers of genes and is labeled by its average gene length (Lc). Plot of each length interval is color coded.
F<sc>igure</sc> 6.—
Figure 6.—
The M-shaped pattern of the intragenic spatial codon usage bias in D. melanogaster (A and B) and the effect of expression level on the pattern (C and D). The c value is plotted in the inverted direction. Asterisks in A and B indicate significant trends in the middle section by isotonic regression (see Table 1). Plots are smoothed by a locally weighted regression in a sliding window of 10 codons (the isotonic regression was applied to the unsmoothed data). Genes are grouped by gene length. Each group contains similar numbers of genes and is labeled by its average gene length (Lc). Plot of each length interval is color coded.
F<sc>igure</sc> 7.—
Figure 7.—
The intragenic spatial pattern of GC-content variation in D. melanogaster. GC content is estimated using the first and second base of each codon position. Genes are first partitioned by expression level as in Figure 6, C and D, and then grouped by gene length measured in the number of codons (Lc).
F<sc>igure</sc> 8.—
Figure 8.—
The effect of introns on the intragenic spatial patterns of codon usage bias (A and B) and GC-content variation (C and D) in D. melanogaster. Only the longest groups of genes are presented. Equal numbers of intronless and two-exon genes are used to ensure appropriate comparisons.
F<sc>igure</sc> 8.—
Figure 8.—
The effect of introns on the intragenic spatial patterns of codon usage bias (A and B) and GC-content variation (C and D) in D. melanogaster. Only the longest groups of genes are presented. Equal numbers of intronless and two-exon genes are used to ensure appropriate comparisons.

Similar articles

Cited by

References

    1. Akashi, H., 1994. Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics 136: 927–935. - PMC - PubMed
    1. Akashi, H., 1995. Inferring weak selection from patterns of polymorphism and divergence at “silent” sites in Drosophila DNA. Genetics 139: 1067–1076. - PMC - PubMed
    1. Akashi, H., 1997. Codon bias evolution in Drosophila: population genetics of mutation-selection drift. Gene 205: 269–278. - PubMed
    1. Akashi, H., 2001. Gene expression and molecular evolution. Curr. Opin. Genet. Dev. 11: 660–666. - PubMed
    1. Akashi, H., 2003. Translational selection and yeast proteome evolution. Genetics 164: 1291–1303. - PMC - PubMed

Publication types

LinkOut - more resources