The role of the codon first letter in the relationship between genomic GC content and protein amino acid composition

Res Microbiol. Jan-Feb 1999;150(1):21-32. doi: 10.1016/s0923-2508(99)80043-6.


Analysis of the statistical distribution of amino acid compositions within 22 protein families shows that a GC bias generally affects proteins with a variety of functions from the extreme thermophile Thermus. This results in evident enrichment in amino acids of the group L, V, A, P, R and G and underrepresentation of amino acids of the group I, M, F, S, T, C and W. The strong amino acid composition biases noted in Thermus proteins are not related to thermoadaptation; they were also found in mesophilic homologues encoded by GC-rich genes. The results of a comparative analysis on large samples of translated sequences from 30 organisms, representing the three major kingdoms of life and including extremophiles, indicate a universal correlation between the usage of particular amino acids and the genomic GC content. It is concluded that the codon first letter plays a dominant role in translating the genomic GC signature into protein amino acid composition and sequences.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence / genetics*
  • Archaea / chemistry
  • Archaea / genetics
  • Archaeal Proteins / chemistry
  • Bacterial Proteins / chemistry*
  • Codon / genetics*
  • Cytosine
  • Genes, Bacterial / genetics*
  • Guanine
  • Phylogeny
  • Statistical Distributions
  • Temperature
  • Thermus / chemistry*
  • Thermus / genetics
  • Thermus / growth & development


  • Archaeal Proteins
  • Bacterial Proteins
  • Codon
  • Guanine
  • Cytosine