Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Oct 1;36(10):2328-2339.
doi: 10.1093/molbev/msz124.

Elucidation of Codon Usage Signatures across the Domains of Life

Affiliations
Free PMC article

Elucidation of Codon Usage Signatures across the Domains of Life

Eva Maria Novoa et al. Mol Biol Evol. .
Free PMC article

Abstract

Because of the degeneracy of the genetic code, multiple codons are translated into the same amino acid. Despite being "synonymous," these codons are not equally used. Selective pressures are thought to drive the choice among synonymous codons within a genome, while GC content, which is typically attributed to mutational drift, is the major determinant of variation across species. Here, we find that in addition to GC content, interspecies codon usage signatures can also be detected. More specifically, we show that a single amino acid, arginine, is the major contributor to codon usage bias differences across domains of life. We then exploit this finding and show that domain-specific codon bias signatures can be used to classify a given sequence into its corresponding domain of life with high accuracy. We then wondered whether the inclusion of codon usage codon autocorrelation patterns, which reflects the nonrandom distribution of codon occurrences throughout a transcript, might improve the classification performance of our algorithm. However, we find that autocorrelation patterns are not domain-specific, and surprisingly, are unrelated to tRNA reusage, in contrast to previous reports. Instead, our results suggest that codon autocorrelation patterns are a by-product of codon optimality throughout a sequence, where highly expressed genes display autocorrelated "optimal" codons, whereas lowly expressed genes display autocorrelated "nonoptimal" codons.

Keywords: codon autocorrelation; codon usage; evolution; tRNA.

PubMed Disclaimer

Figures

<sc>Fig</sc>. 1.
Fig. 1.
Analysis of codon usage bias across the three domains of life. (A) Hierarchical clustering of the average relative synonymous codon usage (RSCU) for each species (n = 1,625). Horizontal bars indicate GC content and domain of life for each species, and show that RSCU clusters species primarily by their GC content rather than by domain.
<sc>Fig</sc>. 2.
Fig. 2.
(A) Scatter plot of the first two principal components of the matrix of RSCU values in panel A. Each dot represents a species, and has been colored according to its corresponding domain. (B) PCA loadings plot for the first two principal components, where each codon has been colored according to its ending nucleotide: G (orange), C (red), A (blue), or T (purple), showing that the PC1 score of a species is primarily determined by differences in frequencies of codons ending in GC or AT, whereas PC2 is mainly driven by differences in frequencies of arginine codons more than those of any other amino acid. See also supplementary table S1, Supplementary Material online for individual contributions of codons to each PC. (C) Boxplot representation of arginine codon usage for each domain of life, showing that Archaea favor AGA and AGG codons, Bacteria favor CGC and CGU codons, and Eukarya show intermediate preferences. (D) 3D scatter plot representing each species by its first three principal component scores, using as input only the RSCU values of arginine codons, showing that arginine codon usage alone allows for discrimination of domains. See also supplementary figure S1, Supplementary Material online for additional principal components and supplementary table S1, Supplementary Material online for individual contributions of each codon to PC2.
<sc>Fig</sc>. 3.
Fig. 3.
Codon usage bias clusters sequences into their corresponding domains. (A) 3D scatter plot of the first three principal component scores for all EMBLCDS sequences included in the analysis. Each dot represents a sequence, and has been colored by its corresponding domain of life: Archaea (blue), Bacteria (red), Eukarya (green). (B) Histograms of the densities of the PC scores for each domain: PC1 scores (left), PC2 scores (middle), PC3 scores (right).
<sc>Fig</sc>. 4.
Fig. 4.
Codon preferences in Saccharomyces cerevisiae as a function of expression levels. Codon preferences, represented by relative synonymous codon usage (RSCU), are reversed between highly and lowly expressed genes for some amino acids but not for others. Although codon usage varies within a genome, intragenome differences are small enough that individual sequences still cluster by domain, as seen in figure 2. See also supplementary figure S2, Supplementary Material online for equivalent plots in Escherichia coli.
<sc>Fig</sc>. 5.
Fig. 5.
Taxonomical classification of sequences using codon usage bias. (A) 3D plot representation of the first three principal component scores. Support Vector Machine hyperplanes computed using the training set are also shown. Each dot represents a sequence, and has been colored according to its corresponding domain of life. (B) ROC curves of the SVM class probabilities, computed separately for each domain. See also supplementary figure S3, Supplementary Material online.
<sc>Fig</sc>. 6.
Fig. 6.
Analysis of codon covariation across species does not support a universal tRNA recycling model. Codon covariation measured over all pairs comprised of one codon and the subsequent one encoding for the same amino acid, shown for Saccharomyces cerevisiae (A) and Homo sapiens (B). Values correspond to standard deviations from expected. Each codon has been labeled with its corresponding decoding tRNA, following parsimony-extended wobble rules when no Watson-Crick matching tRNA isoacceptor is available (as per gtRNAdb, Chan and Lowe 2016). Pairs have been shaded according to the number of standard deviations from expected: dark gray (>+3SD; strongly favored codon pair), light gray (0–3SD; slightly favored codon pair), white (≤0 SD; nonfavored codon pair). In yeast, most codon pairs using the same tRNA are overrepresented, supporting a tRNA recycling model to explain the overrepresentation, but that is not true in other species. See also supplementary figure S4, Supplementary Material online for similar codon covariation analyses for Escherichia coli and Plasmodium falciparum.
<sc>Fig</sc>. 7.
Fig. 7.
Codon covariation is likely a consequence of the co-occurrence of “optimal” and “nonoptimal” codons, in highly and lowly expressed proteins, respectively. Codon covariation for Saccharomyces cerevisiae as depicted in figure 6, highlighting those pairs that are formed by two optimal codons (dark green), two nonoptimal codons (red), and codons with intermediate optimality (yellow). Optimal and nonoptimal codons have been defined as those that are highly abundant and lowly abundant in highly expressed proteins, and their relative abundance is shown for each individual amino acid and codon. See also supplementary figure S5, Supplementary Material online for the analysis for all amino acids S. cerevisiae, as well as for same analysis performed in Escherichia coli.
<sc>Fig</sc>. 8.
Fig. 8.
Hierarchical clustering of codon covariation patterns across species spanning the three domains of life. Each codon pair has been colored according to its RSCPU value. The upper bar over the heatmap represents the corresponding domain of each species. See also supplementary figure S6, Supplementary Material online.

Similar articles

Cited by

References

    1. Akashi H. 1994. Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics 1363:927–935. - PMC - PubMed
    1. Alneberg J, Bjarnason BS, de Bruijn I, Schirmer M, Quick J, Ijaz UZ, Lahti L, Loman NJ, Andersson AF, Quince C.. 2014. Binning metagenomic contigs by coverage and composition. Nat Methods. 1111:1144–1146. - PubMed
    1. Bazzini AA, Del Viso F, Moreno-Mateos MA, Johnstone TG, Vejnar CE, Qin Y, Yao J, Khokha MK, Giraldez AJ.. 2016. Codon identity regulates mRNA stability and translation efficiency during the maternal-to-zygotic transition. EMBO J. 3519:2087–2103. - PMC - PubMed
    1. Bonekamp F, Jensen KF.. 1988. The AGG codon is translated slowly in E. coli even at very low expression levels. Nucleic Acids Res. 167:3013–3024. - PMC - PubMed
    1. Brady A, Salzberg SL.. 2009. Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat Methods 69:673–676. - PMC - PubMed

Publication types