We have analyzed codon usage for poly-Q stretches of different lengths for the human proteome. First, we have obtained that all long poly-Q stretches in Protein Data Bank (PDB) belong to the disordered regions. Second, we have found the bias for codon usage for glutamine homo-repeats in the human proteome. In the cases when the same codon is used for poly-Q stretches only CAG triplets are found. Similar results are obtained for human proteins with glutamine homo-repeats associated with diseases. Moreover, for proteins associated with diseases (from the HraDis database), the fraction of proteins for which the same codon is used for glutamine homo-repeats is less (22%) than for proteins from the human proteome (26%). We have demonstrated for poly-Q stretches in the human proteome that in some cases (28) the splicing sites correspond to the homo-repeats and in 11 cases, these sites appear at the -terminal part of the homo-repeats with statistical significance 10 .
Keywords: Homo-repeat; codon usage; disease; proteome; splicing site.