Motivation: It is widely known that terminal residues of proteins (i.e. the N- and C-termini) are predominantly located on the surface of proteins and exposed to the solvent. However, there is no good explanation as to the forces driving this phenomenon. The common explanation that terminal residues are charged, and charged residues prefer to be on the surface, cannot explain the magnitude of the phenomenon. Here, we survey a large number of proteins from the PDB in order to explore, quantitatively, this phenomenon, and then we use a lattice model to study the mechanisms involved.
Results: The location of the termini was examined for 425 small monomeric proteins (50-200 amino acids) and it was found that the average solvent accessibility of termini residues is 87.1% compared with 49.2% of charged residues and 35.9% of all residues. Using a cutoff of 50% of the maximal possible exposure, 80.3% of the N-terminal and 86.1% of the C-terminal residues are exposed compared to 32% for all residues. In addition, terminal residues are much more distant from the center of mass of their proteins than other residues. Using a 2D lattice, a large population of model proteins was studied on three levels: structural selection of compact structures, thermodynamic selection of conformations with a pronounced energy gap and kinetic selection of fast folding proteins using Monte-Carlo simulations. Progressively, each selection raises the proportion of proteins with termini on the surface, resulting in similar proportions to those observed for real proteins.