High solubility of random-sequence proteins consisting of five kinds of primitive amino acids

Protein Eng Des Sel. 2005 Jun;18(6):279-84. doi: 10.1093/protein/gzi034. Epub 2005 May 31.


Searching for functional proteins among random-sequence libraries is a major challenge of protein engineering; the difficulties include the poor solubility of many random-sequence proteins. A library in which most of the polypeptides are soluble and stable would therefore be of great benefit. Although modern proteins consist of 20 amino acids, it has been suggested that early proteins evolved from a reduced alphabet. Here, we have constructed a library of random-sequence proteins consisting of only five amino acids, Ala, Gly, Val, Asp and Glu, which are believed to have been the most abundant in the prebiotic environment. Expression and characterization of arbitrarily chosen proteins in the library indicated that five-alphabet random-sequence proteins have higher solubility than do 20-alphabet random-sequence proteins with a similar level of hydrophobicity. The results support the reduced-alphabet hypothesis of the primordial genetic code and should also be helpful in constructing optimized protein libraries for evolutionary protein engineering.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Amino Acids / chemistry*
  • Base Sequence
  • Cloning, Molecular
  • Escherichia coli
  • Evolution, Molecular
  • Gene Library*
  • Hydrophobic and Hydrophilic Interactions
  • Molecular Sequence Data
  • Protein Biosynthesis
  • Protein Engineering*
  • Proteins / chemistry*
  • Proteins / genetics
  • Proteins / metabolism
  • Solubility


  • Amino Acids
  • Proteins