High GC content causes orphan proteins to be intrinsically disordered

PLoS Comput Biol. 2017 Mar 29;13(3):e1005375. doi: 10.1371/journal.pcbi.1005375. eCollection 2017 Mar.


De novo creation of protein coding genes involves the formation of short ORFs from noncoding regions; some of these ORFs might then become fixed in the population. These orphan proteins need to, at the bare minimum, not cause serious harm to the organism, meaning that they should for instance not aggregate. Therefore, although the creation of short ORFs could be truly random, the fixation should be subjected to some selective pressure. The selective forces acting on orphan proteins have been elusive, and contradictory results have been reported. In Drosophila young proteins are more disordered than ancient ones, while the opposite trend is present in yeast. To the best of our knowledge no valid explanation for this difference has been proposed. To solve this riddle we studied structural properties and age of proteins in 187 eukaryotic organisms. We find that, with the exception of length, there are only small differences in the properties between proteins of different ages. However, when we take the GC content into account we noted that it could explain the opposite trends observed for orphans in yeast (low GC) and Drosophila (high GC). GC content is correlated with codons coding for disorder promoting amino acids. This leads us to propose that intrinsic disorder is not a strong determining factor for fixation of orphan proteins. Instead these proteins largely resemble random proteins given a particular GC level. During evolution the properties of a protein change faster than the GC level causing the relationship between disorder and GC to gradually weaken.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Base Composition
  • Computational Biology
  • Databases, Protein
  • Drosophila Proteins / chemistry
  • Drosophila Proteins / genetics
  • Evolution, Molecular
  • Gene Ontology
  • Intrinsically Disordered Proteins / chemistry*
  • Intrinsically Disordered Proteins / genetics*
  • Open Reading Frames
  • Phylogeny
  • Saccharomyces cerevisiae Proteins / chemistry
  • Saccharomyces cerevisiae Proteins / genetics
  • Selection, Genetic
  • Structural Homology, Protein


  • Drosophila Proteins
  • Intrinsically Disordered Proteins
  • Saccharomyces cerevisiae Proteins

Grants and funding

This work was supported by grants from the Swedish Research Council (http://www.vr.se/, VR-NT 2012-5046, VR-M 2010-3555) and the Swedish E-science Research Center (SeRC, www.e-science.se). Computational resources were provided by the Swedish National Infrastructure for Computing (SNIC, http://www.snic.vr.se/). SL was financed by Bioinformatics Infrastructure for Life Science (BILS, www.bils.se). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.