Biases in arginine codon usage correlate with genetic disease risk

Genet Med. 2020 Aug;22(8):1407-1412. doi: 10.1038/s41436-020-0813-6. Epub 2020 May 6.


Purpose: The persistence of hypermutable CGN (CGG, CGA, CGC, CGU) arginine codons at high frequency suggests the possibility of negative selective pressure at these sites and that arginine codon usage could be a predictive indicator of human disease genes.

Methods: We analyzed arginine codons (CGN, AGG, AGA) from all canonical Ensembl protein coding gene transcripts before comparing the frequency of CGN codons between genes with and without human disease associations and with gnomAD constraint metrics.

Results: The frequency of CGN codons among a gene's total arginine codon count was higher in genes linked to syndromic autism spectrum disorder (ASD) compared with genes not associated with ASD. A comparison of genes annotated as dominant or recessive with control genes not matching either classification revealed a progressive increase in CGN codon frequency. Moreover, CGN frequency was positively correlated with a gene's probability of loss-of-function intolerance (pLI) score and negatively correlated with observed-over-expected ratios for both loss-of-function and missense variants.

Conclusion: Our findings indicate that genes utilizing CGN arginine codons rather than AGG or AGA are more likely to underlie single-gene disorders, particularly for dominant phenotypes, and thus constitute candidate genes for the study of human genetic disease.

Keywords: arginine substitution; autism spectrum disorders; codon usage; de novo.

MeSH terms

  • Arginine* / genetics
  • Autism Spectrum Disorder* / genetics
  • Bias
  • Codon Usage
  • Escherichia coli / genetics
  • Humans


  • Arginine