Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;7(2):e30824.
doi: 10.1371/journal.pone.0030824. Epub 2012 Feb 1.

Polyglutamine repeats are associated to specific sequence biases that are conserved among eukaryotes

Affiliations

Polyglutamine repeats are associated to specific sequence biases that are conserved among eukaryotes

Matteo Ramazzotti et al. PLoS One. 2012.

Abstract

Nine human neurodegenerative diseases, including Huntington's disease and several spinocerebellar ataxia, are associated to the aggregation of proteins comprising an extended tract of consecutive glutamine residues (polyQs) once it exceeds a certain length threshold. This event is believed to be the consequence of the expansion of polyCAG codons during the replication process. This is in apparent contradiction with the fact that many polyQs-containing proteins remain soluble and are encoded by invariant genes in a number of eukaryotes. The latter suggests that polyQs expansion and/or aggregation might be counter-selected through a genetic and/or protein context. To identify this context, we designed a software that scrutinize entire proteomes in search for imperfect polyQs. The nature of residues flanking the polyQs and that of residues other than Gln within polyQs (insertions) were assessed. We discovered strong amino acid residue biases robustly associated to polyQs in the 15 eukaryotic proteomes we examined, with an over-representation of Pro, Leu and His and an under-representation of Asp, Cys and Gly amino acid residues. These biases are conserved amongst unrelated proteins and are independent of specific functional classes. Our findings suggest that specific residues have been co-selected with polyQs during evolution. We discuss the possible selective pressures responsible of the observed biases.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Schematic of a polyQ.
A polyQ contains a minimum of five consecutive Q residues. The maximum proportion of residues other than Q (insertions) is 25% and each insertion cannot be over 5-residues long. The N- and C-terminal flanks of the polyQ are labeled Nt and Ct flanks, respectively. The numbering scheme for the residues within the flanks is shown.
Figure 2
Figure 2. Size distribution of the polyQs in the human proteome.
Black, imperfect polyQs; dark gray, pure polyQs; light gray, pure polyCAGs.
Figure 3
Figure 3. PolyQs associated to human diseases are significantly longer than those that are not.
(A) Length of the imperfect polyQs. (B) Length of the longest pure polyQ within each polyQ zone. (C) Length of the longest pure polyCAG within each polyQ zone. ***, p<0.001; **, p<0.01; *, p<0.05 (Mann-Whitney and Kolmogorov-Smirnov tests).
Figure 4
Figure 4. Sequence biases associated to polyQs in the human proteome.
(A) Sequence biases within polyQs (insertions). The relative abundances of each residue type within polyQs are represented. (B–G) Sequences biases at the flanks of polyQs. Each point represents the relative abundance of Pro (B), His (C), Leu (D), Asp (E), Cys (F) and Gly (G) at each of the 30 positions within the Nt (black circles) and Ct (gray circles) flanks. The relative abundances within the polyQ insertions are also indicated (stars). The solid lines are the best fit to an exponential function. The dotted lines indicate the threshold for residues over- (residue twice as frequent as in the proteome) or under- (residue twice less frequent than in the proteome) representation.
Figure 5
Figure 5. The polyQ zone.
PolyQs insertions and flanks share identical residue biases: Pro, Leu and His are over-represented within these zones while Asp, Cys and Gly are under-represented.
Figure 6
Figure 6. Sequence biases associated to polyQs are conserved throughout eukaryotic proteomes.
The relative abundances of Pro (A), His (B), Leu (C), Asp (D), Cys (E) and Gly (F) within polyQs insertions in the 15 eukaryotic proteomes we analyzed are represented. The dotted lines indicate the threshold for residues over- (residue twice as frequent as in the proteome) or under- (residue twice less frequent than in the proteome) representation.
Figure 7
Figure 7. Eukaryotic polyQ-containing proteins are not orthologous.
The number of polyQ-containing proteins that are orthologous to human polyQ-containing proteins in different organisms are highlighted in black. Those that are not are in gray.
Figure 8
Figure 8. A survey of additional biases within polyQs.
(A) Codon biases within polyQ insertions. The relative abundances of the different Pro, His and Leu codons within polyQs insertions in human, chimpanzee, orangutan, mouse and dog are represented. The dotted lines indicate the threshold for codons over- (codons twice as frequent as in the rest of the open reading frames) or under- (codons twice less frequent than in the rest of the open reading frames) representation. (B) Sequence biases within human polyQs are not specific to transcription factors. The relative abundance of each residue within polyQs is represented for all human polyQs (black), for human polyQs tagged by the GO term “regulation of transcription DNA-dependent” (dark gray) and human polyQs that are not tagged by the GO term “regulation of transcription DNA-dependent” (light gray). The dotted lines indicate the threshold for residues over- (residue twice as frequent as in the proteome) or under- (residue twice less frequent than in the proteome) representation.

Similar articles

Cited by

References

    1. Chiti F, Dobson CM. Protein misfolding, functional amyloid, and human disease. Annu Rev Biochem. 2006;75:333–66. - PubMed
    1. Hands SL, Wyttenbach A. Neurotoxic protein oligomerisation associated with polyglutamine diseases. Acta Neuropathol. 2010;120:419–37. - PubMed
    1. Ross CA, Tabrizi SJ. Huntington's disease: from molecular pathogenesis to clinical treatment. Lancet Neurol. 2010;10:83–98. - PubMed
    1. Zoghbi HY, Orr HT. Pathogenic mechanisms of a polyglutamine-mediated neurodegenerative disease, spinocerebellar ataxia type 1. J Biol Chem. 2008;284:7425–9. - PMC - PubMed
    1. Duyao M, Ambrose C, Myers R, Novelletto A, Persichetti F, et al. Trinucleotide repeat length instability and age of onset in Huntington's disease. Nat Genet. 1993;4:387–92. - PubMed

Publication types