The correlation of protein hydropathy with the base composition of coding sequences

Gene. 1999 Sep 30;238(1):3-14. doi: 10.1016/s0378-1119(99)00257-7.

Abstract

The "universal correlation" (D'Onofrio, G., Bernardi, G., 1992. A universal compositional correlation among codon positions. Gene 110, 81-88.) that holds between <GC3> and <GC1> or <GC2> (<GC> values are the average values of the coding sequences of each genome analyzed) at both the inter- and intra-genomic level, was re-analyzed on a vastly larger dataset. The results showed a slight, but significant, difference in the <GC3> vs. <GC1> correlations exhibited by prokaryotes and eukaryotes. This finding prompted an analysis of the correlation between <GC3> and the amino acid frequencies in the encoded proteins, which has shown that positive correlations exist between <GC3> values of coding sequences and the hydropathy of the corresponding proteins. These correlations are due to the fact that hydrophobic and amphypathic amino acids increase, whereas hydrophilic amino acids decrease with increasing <GC3> values. Hydropathy values of prokaryotic proteins are systematically higher than those of eukaryotes, but the slopes of the regression lines are identical. The lower hydrophobicity of eukaryotic proteins is due to differences in the amino acid composition. In particular, the twofold higher cysteine (and disulfide bond) level of eukaryotic proteins compared to prokaryotic proteins most probably compensates for their lower hydrophobicity. This supports the viewpoint that hydrophobicity plays a structural and functional role as far as protein stability is concerned.

MeSH terms

  • Amino Acids / chemistry
  • Base Composition*
  • Base Sequence
  • Codon
  • Genome
  • Proteins / chemistry*
  • Proteins / genetics

Substances

  • Amino Acids
  • Codon
  • Proteins