Codon usages in different gene classes of the Escherichia coli genome

Mol Microbiol. 1998 Sep;29(6):1341-55. doi: 10.1046/j.1365-2958.1998.01008.x.


A new measure for assessing codon bias of one group of genes with respect to a second group of genes is introduced. In this formulation, codon bias correlations for Escherichia coli genes are evaluated for level of expression, for contrasts along genes, for genes in different 200 kb (or longer) contigs around the genome, for effects of gene size, for variation over different function classes, for codon bias in relation to possible lateral transfer and for dicodon bias for some gene classes. Among the function classes, codon biases of ribosomal proteins are the most deviant from the codon frequencies of the average E. coli gene. Other classes of 'highly expressed genes' (e.g. amino acyl tRNA synthetases, chaperonins, modification genes essential to translation activities) show less extreme codon biases. Consistently for genes with experimentally determined expression rates in the exponential growth phase, those of highest molar abundances are more deviant from the average gene codon frequencies and are more similar in codon frequencies to the average ribosomal protein gene. Independent of gene size, the codon biases in the 5' third of genes deviate by more than a factor of two from those in the middle and 3' thirds. In this context, there appear to be conflicting selection pressures imposed by the constraints of ribosomal binding, or more generally the early phase of protein synthesis (about the first 50 codons) may be more biased than the complete nascent polypeptide. In partitioning the E. coli genome into 10 equal lengths, pronounced differences in codon site 3 G+C frequencies accumulate. Genes near to oriC have 5% greater codon site 3 G+C frequencies than do genes from the ter region. This difference also is observed between small (100-300 codons) and large (>800 codons) genes. This result contrasts with that for eukaryotic genomes (including human, Caenorhabditis elegans and yeast) where long genes tend to have site 3 more AT rich than short genes. Many of the above results are special for E. coli genes and do not apply to genes of most bacterial genomes. A gene is defined as alien (possibly horizontally transferred) if its codon bias relative to the average gene exceeds a high threshold and the codon bias relative to ribosomal proteins is also appropriately high. These are identified, including four clusters (operons). The bulk of these genes have no known function.

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Amino Acyl-tRNA Synthetases / genetics
  • Animals
  • Bacterial Proteins / chemistry
  • Bacterial Proteins / genetics
  • Base Composition
  • Chromosomes, Bacterial / genetics
  • Codon / genetics*
  • Coliphages / genetics
  • DNA, Bacterial / genetics
  • DNA, Viral / genetics
  • Escherichia coli / enzymology
  • Escherichia coli / genetics*
  • Escherichia coli / growth & development
  • Gene Expression
  • Genes, Bacterial*
  • Genome, Bacterial*
  • Humans
  • Operon
  • Protein Biosynthesis
  • Protein Folding
  • Ribosomal Proteins / genetics
  • Species Specificity


  • Bacterial Proteins
  • Codon
  • DNA, Bacterial
  • DNA, Viral
  • Ribosomal Proteins
  • Amino Acyl-tRNA Synthetases