Is there codon usage bias for poly-Q stretches in the human proteome?

J Bioinform Comput Biol. 2019 Feb;17(1):1950010. doi: 10.1142/S0219720019500100.


We have analyzed codon usage for poly-Q stretches of different lengths for the human proteome. First, we have obtained that all long poly-Q stretches in Protein Data Bank (PDB) belong to the disordered regions. Second, we have found the bias for codon usage for glutamine homo-repeats in the human proteome. In the cases when the same codon is used for poly-Q stretches only CAG triplets are found. Similar results are obtained for human proteins with glutamine homo-repeats associated with diseases. Moreover, for proteins associated with diseases (from the HraDis database), the fraction of proteins for which the same codon is used for glutamine homo-repeats is less (22%) than for proteins from the human proteome (26%). We have demonstrated for poly-Q stretches in the human proteome that in some cases (28) the splicing sites correspond to the homo-repeats and in 11 cases, these sites appear at the C -terminal part of the homo-repeats with statistical significance 10 -8 .

Keywords: Homo-repeat; codon usage; disease; proteome; splicing site.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Codon Usage / genetics*
  • Computational Biology
  • Databases, Protein / statistics & numerical data
  • Humans
  • Intrinsically Disordered Proteins / chemistry
  • Intrinsically Disordered Proteins / genetics
  • Peptides / chemistry
  • Peptides / genetics*
  • Proteome / genetics*
  • Repetitive Sequences, Amino Acid


  • Intrinsically Disordered Proteins
  • Peptides
  • Proteome
  • polyglutamine