A census of protein repeats
- PMID: 10512723
- DOI: 10.1006/jmbi.1999.3136
A census of protein repeats
Abstract
In this study, we analyzed all known protein sequences for repeating amino acid segments. Although duplicated sequence segments occur in 14 % of all proteins, eukaryotic proteins are three times more likely to have internal repeats than prokaryotic proteins. After clustering the repetitive sequence segments into families, we find repeats from eukaryotic proteins have little similarity with prokaryotic repeats, suggesting most repeats arose after the prokaryotic and eukaryotic lineages diverged. Consequently, protein classes with the highest incidence of repetitive sequences perform functions unique to eukaryotes. The frequency distribution of the repeating units shows only weak length dependence, implicating recombination rather than duplex melting or DNA hairpin formation as the limiting mechanism underlying repeat formation. The mechanism favors additional repeats once an initial duplication has been incorporated. Finally, we show that repetitive sequences are favored that contain small and relatively water-soluble residues. We propose that error-prone repeat expansion allows repetitive proteins to evolve more quickly than non-repeat-containing proteins.
Copyright 1998 Academic Press.
Similar articles
-
Genomic and evolutionary insights into genes encoding proteins with single amino acid repeats.Mol Biol Evol. 2006 Jul;23(7):1357-69. doi: 10.1093/molbev/msk022. Epub 2006 Apr 17. Mol Biol Evol. 2006. PMID: 16618963
-
Comparison of ARM and HEAT protein repeats.J Mol Biol. 2001 May 25;309(1):1-18. doi: 10.1006/jmbi.2001.4624. J Mol Biol. 2001. PMID: 11491282 Review.
-
Functional insights from the distribution and role of homopeptide repeat-containing proteins.Genome Res. 2005 Apr;15(4):537-51. doi: 10.1101/gr.3096505. Genome Res. 2005. PMID: 15805494 Free PMC article.
-
Homologs of eukaryotic Ras superfamily proteins in prokaryotes and their novel phylogenetic correlation with their eukaryotic analogs.Gene. 2007 Jul 1;396(1):116-24. doi: 10.1016/j.gene.2007.03.001. Epub 2007 Mar 14. Gene. 2007. PMID: 17449198
-
Evolution of the spectrin repeat.Bioessays. 1997 Sep;19(9):811-7. doi: 10.1002/bies.950190911. Bioessays. 1997. PMID: 9297972 Review.
Cited by
-
Diversity and structural-functional insights of alpha-solenoid proteins.Protein Sci. 2024 Nov;33(11):e5189. doi: 10.1002/pro.5189. Protein Sci. 2024. PMID: 39465903 Free PMC article. Review.
-
Systematic discovery of DNA-binding tandem repeat proteins.Nucleic Acids Res. 2024 Sep 23;52(17):10464-10489. doi: 10.1093/nar/gkae710. Nucleic Acids Res. 2024. PMID: 39189466 Free PMC article.
-
Terminal regions of a protein are a hotspot for low complexity regions and selection.Open Biol. 2024 Jun;14(6):230439. doi: 10.1098/rsob.230439. Epub 2024 Jun 12. Open Biol. 2024. PMID: 38862022 Free PMC article.
-
Structured Tandem Repeats in Protein Interactions.Int J Mol Sci. 2024 Mar 5;25(5):2994. doi: 10.3390/ijms25052994. Int J Mol Sci. 2024. PMID: 38474241 Free PMC article.
-
Searching for EGF Fragments Recreating the Outer Sphere of the Growth Factor Involved in Receptor Interactions.Int J Mol Sci. 2024 Jan 25;25(3):1470. doi: 10.3390/ijms25031470. Int J Mol Sci. 2024. PMID: 38338748 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
