Key Role of Amino Acid Repeat Expansions in the Functional Diversification of Duplicated Transcription Factors

Mol Biol Evol. 2015 Sep;32(9):2263-72. doi: 10.1093/molbev/msv103. Epub 2015 Apr 29.


The high regulatory complexity of vertebrates has been related to two rounds of whole genome duplication (2R-WGD) that occurred before the divergence of the major vertebrate groups. Following these events, many developmental transcription factors (TFs) were retained in multiple copies and subsequently specialized in diverse functions, whereas others reverted to their singleton state. TFs are known to be generally rich in amino acid repeats or low-complexity regions (LCRs), such as polyalanine or polyglutamine runs, which can evolve rapidly and potentially influence the transcriptional activity of the protein. Here we test the hypothesis that LCRs have played a major role in the diversification of TF gene duplicates. We find that nearly half of the TF gene families originated during the 2R-WGD contains LCRs. The number of gene duplicates with LCRs is 155 out of 550 analyzed (28%), about twice as many as the number of single copy genes with LCRs (15 out of 115, 13%). In addition, duplicated TFs preferentially accumulate certain LCR types, the most prominent of which are alanine repeats. We experimentally test the role of alanine-rich LCRs in two different TF gene families, PHOX2A/PHOX2B and LHX2/LHX9. In both cases, the presence of the alanine-rich LCR in one of the copies (PHOX2B and LHX2) significantly increases the capacity of the TF to activate transcription. Taken together, the results provide strong evidence that LCRs are important driving forces of evolutionary change in duplicated genes.

Keywords: PHOX2B; gene duplication; low-complexity region; polyalanine; transcription factor.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Evolution, Molecular
  • Gene Duplication
  • Humans
  • LIM-Homeodomain Proteins / genetics*
  • Phylogeny
  • Transcription Factors / genetics*
  • Transcriptional Activation
  • Trinucleotide Repeat Expansion*


  • LIM-Homeodomain Proteins
  • Transcription Factors