The earliest proteins had to rely on amino acids available on early Earth before the biosynthetic pathways for more complex amino acids evolved. In extant proteins, a significant fraction of the 'late' amino acids (such as Arg, Lys, His, Cys, Trp and Tyr) belong to essential catalytic and structure-stabilizing residues. How (or if) early proteins could sustain an early biosphere has been a major puzzle. Here, we analysed two combinatorial protein libraries representing proxies of the available sequence space at two different evolutionary stages. The first is composed of the entire alphabet of 20 amino acids while the second one consists of only 10 residues (ASDGLIPTEV) representing a consensus view of plausibly available amino acids through prebiotic chemistry. We show that compact conformations resistant to proteolysis are surprisingly similarly abundant in both libraries. In addition, the early alphabet proteins are inherently more soluble and refoldable, independent of the general Hsp70 chaperone activity. By contrast, chaperones significantly increase the otherwise poor solubility of the modern alphabet proteins suggesting their coevolution with the amino acid repertoire. Our work indicates that while both early and modern amino acids are predisposed to supporting protein structure, they do so with different biophysical properties and via different mechanisms.
Keywords: amino acid alphabet; genetic code evolution; protein sequence space; protein structure; random proteins.