Amino acids runs and genomic compositional biases in vertebrates

Genomics. 2004 Mar;83(3):502-7. doi: 10.1016/j.ygeno.2003.09.004.


A compositional analysis of a sample of 50 zebrafish proteins containing at least one alanine run and of their open reading frames (ORFs) has been performed. The sample of poly(Ala) proteins showed a tendency to have runs of other amino acids (His/H, Gln/Q, Ser/S, Pro/P). Their ORFs and the first and second codon positions had higher GC contents than a reference gene set. The "universal" correlation between the GC content of the first+second and third codon positions (GC1+2 vs GC3) does not hold, but I provide an explanation in terms of genomic heterogeneity. Significant correlation between AHQS content and GC3 was obtained, reflecting codon bias favoring G/C at the third codon position of these amino acids. A correspondence analysis (COA) of relative synonymous codon usage showed that the poly(Ala) proteins have a biased distribution according to the second axis of the COA, which correlates with gene expression in zebrafish. A comparison with human is undertaken.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acids / chemistry
  • Amino Acids / genetics*
  • Animals
  • Base Composition*
  • Codon
  • Genome
  • Humans
  • Linear Models
  • Peptides / chemistry
  • Peptides / genetics
  • Peptides / metabolism
  • Statistics as Topic
  • Zebrafish / genetics
  • Zebrafish / metabolism


  • Amino Acids
  • Codon
  • Peptides
  • polyalanine