Background: Eukaryotic promoters are regions containing various sequence motifs necessary to control gene transcription. Much evidence has emerged showing that structural and/or contextual changes in regulatory elements can critically affect cis-regulatory activity. As sequence motifs can be key factors in maintaining complex promoter architectures, one effective approach to further understand the evolution of promoter regions in vertebrates is to compare the abundance and distribution patterns of sequence motifs in these regions between divergent species. When compared with mammals, the chicken (Gallus gallus) has a very different genome composition and sufficient genomic information to make it a good model for the exploration of promoter structure and evolution.
Results: More than 10% of chicken genes contained short tandem repeat (STR) in the region 2 kb upstream of promoters, but the total number of STRs observed in chicken is approximately half of that detected in human promoters. In terms of the STR motif frequencies, chicken promoter regions were more similar to other avian and mammalian promoters than these were to the entire chicken genome. Unlike other STRs, nearly half of the trinucleotide repeats found in promoters partly or entirely overlapped with CpG islands, indicating potential association with nucleosome positions. Moreover, the chicken promoters are abundant with sequence motifs such as poly-A, poly-G and G-quadruplexes, especially in the core region, that are otherwise rare in the genome. Most of sequence motifs showed strong functional enrichment for particular gene ontology (GO) categories, indicating roles in regulation of transcription and gene expression, as well as immune response and cognition.
Conclusions: Chicken promoter regions share some, but not all, of the structural features observed in mammalian promoters. The findings presented here provide empirical evidence suggesting that the frequencies and locations of STR motifs have been conserved through promoter evolution in a lineage-specific manner. Correlation analysis between GO categories and sequence motifs suggests motif-specific constraints acting on gene function.