Genomic and evolutionary insights into genes encoding proteins with single amino acid repeats

Mol Biol Evol. 2006 Jul;23(7):1357-69. doi: 10.1093/molbev/msk022. Epub 2006 Apr 17.

Abstract

Mutations causing expansion of amino acid repeats are responsible for 19 hereditary disorders. Repeats in several other proteins also show length variations. These observations prompted us to identify single amino acid repeat-containing proteins (SARPs) in humans and to understand their functional and evolutionary significance. We identified 8812 SARPs containing 17 146 repeat domains, each harboring 4 or more residues. In all, 5% of SARPs (471) showed repeat length variations, and nearly 84% of them (394) have repeats of 10 residues or less. We find that SARPs are involved in functions that require formation of multiprotein complexes. Nearly 78% (6859) of the SARPs did not find a paralogue in the human proteome, and such proteins are considered as orphan SARPs. Orphan SARPs show longer repeat stretches, longer peptide length, and lower expression levels as compared with SARPs belonging to protein family. Because the intensity of gene expression is known to relate inversely with the rate of protein sequence evolution, our results suggest that the orphan SARPs evolve faster than the familial forms and therefore are under a weaker selection pressure. We also find that while GC-rich codons are favored for coding the repeat tracts of SARPs, specific codons and not nucleotide motifs per se are selected, suggesting functional constraints placed on the usage of codons. One of the constraints could be the mRNA stability as clustering of rare codons is known to destabilize the transcripts and rare codons are not favored for coding repeat tracts. Genes encoding polymorphic SARPs show preferential localization toward the telomeric segments. Further, the sex-specific recombination rates of the chromosomal locus strongly correlate with the parental gender that influence the repeat instability in disorder caused by dynamic mutation. Therefore, instability associated with repeats might be driven by processes that are specific to sperm or oocyte development, and the recombination frequency might play a positive role in this process.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence
  • Codon / genetics
  • Computational Biology / methods
  • Evolution, Molecular*
  • Female
  • GC Rich Sequence / genetics
  • Gene Frequency
  • Genome, Human / genetics
  • Genomics*
  • Humans
  • Male
  • Polymorphism, Genetic / genetics
  • Proteins / genetics*
  • Repetitive Sequences, Amino Acid / genetics*
  • Trinucleotide Repeat Expansion / genetics
  • Trinucleotide Repeats / genetics

Substances

  • Codon
  • Proteins