Proteome-scale understanding of relationship between homo-repeat enrichments and protein aggregation properties

PLoS One. 2018 Nov 6;13(11):e0206941. doi: 10.1371/journal.pone.0206941. eCollection 2018.


Expansion of homo-repeats is a molecular basis for human neurological diseases. We are the first who studied the influence of homo-repeats with lengths larger than four amino acid residues on the aggregation properties of 1449683 proteins across 122 eukaryotic and bacterial proteomes. Only 15% of proteins (215481) include homo-repeats of such length. We demonstrated that RNA-binding proteins with a prion-like domain are enriched with homo-repeats in comparison with other non-redundant protein sequences and those in the PDB. We performed a bioinformatics analysis for these proteins and found that proteins with homo-repeats are on average two times longer than those in the whole database. Moreover, we are first to discover that as a rule, homo-repeats appear in proteins not alone but in pairs: hydrophobic and aromatic homo-repeats appear with similar ones, while homo-repeats with small, polar and charged amino acids appear together with different preferences. We elaborated a new complementary approach to demonstrate the influence of homo-repeats on their host protein aggregation properties. We have shown that addition of artificial homo-repeats to natural and random proteins results in intensification of aggregation properties of the proteins. The maximal effect is observed for the insertion of artificial homo-repeats with 5-6 residues, which is consistent with the minimal length of an amyloidogenic region. We have also demonstrated that the ability of proteins with homo-repeats to aggregate cannot be explained only by the presence of long homo-repeats in them. There should be other characteristics of proteins intensifying the aggregation property including such as the appearance of homo-repeats in pairs in the same protein. We are the first who elaborated a new approach to study the influence of homo-repeats present in proteins on their aggregation properties and performed an appropriate analysis of the large number of proteomes and proteins.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence / genetics
  • Bacteria / genetics
  • Bacterial Proteins / genetics
  • Computational Biology*
  • Databases, Genetic
  • Eukaryotic Cells
  • Humans
  • Protein Aggregates / genetics*
  • Proteome / genetics*
  • RNA-Binding Proteins / genetics*
  • Repetitive Sequences, Amino Acid


  • Bacterial Proteins
  • Protein Aggregates
  • Proteome
  • RNA-Binding Proteins

Grant support

This study was supported by the program of Russian Academy of Science "Molecular and Cellular Biology," Grant number 01201353567, awarded to OG. This study was also supported by Russian Science Foundation, Grant number 18-14-00321, awarded to OG. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.