Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2006;7(7):R53.
doi: 10.1186/gb-2006-7-7-r53.

Comparative genomics of Drosophila and human core promoters

Affiliations
Comparative Study

Comparative genomics of Drosophila and human core promoters

Peter C FitzGerald et al. Genome Biol. 2006.

Abstract

Background: The core promoter region plays a critical role in the regulation of eukaryotic gene expression. We have determined the non-random distribution of DNA sequences relative to the transcriptional start site in Drosophila melanogaster promoters to identify sequences that may be biologically significant. We compare these results with those obtained for human promoters.

Results: We determined the distribution of all 65,536 octamer (8-mers) DNA sequences in 10,914 Drosophila promoters and two sets of human promoters aligned relative to the transcriptional start site. In Drosophila, 298 8-mers have highly significant (p < or = 1 x 10(-16)) non-random distributions peaking within 100 base-pairs of the transcriptional start site. These sequences were grouped into 15 DNA motifs. Ten motifs, termed directional motifs, occur only on the positive strand while the remaining five motifs, termed non-directional motifs, occur on both strands. The only directional motifs to localize in human promoters are TATA, INR, and DPE. The directional motifs were further subdivided into those precisely positioned relative to the transcriptional start site and those that are positioned more loosely relative to the transcriptional start site. Similar numbers of non-directional motifs were identified in both species and most are different. The genes associated with all 15 DNA motifs, when they occur in the peak, are enriched in specific Gene Ontology categories and show a distinct mRNA expression pattern, suggesting that there is a core promoter code in Drosophila.

Conclusion: Drosophila and human promoters use different DNA sequences to regulate gene expression, supporting the idea that evolution occurs by the modulation of gene regulation.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The distribution of nucleotides across Drosophila and human promoters. The distribution of mononucleotides across the (a) 1,500 bp region of 10,914 Drosophila and (b) 15,011 and (c) 12,926 human promoters; the frequency of each mononucleotide is plotted against position (in 20 bp bins). The TSS occurs in bin 51 and its location is indicated. (d) The frequency of occurrence of the CA dinucleotide, at a single base-pair resolution across the 1,500 bp promoter region for all three datasets.
Figure 2
Figure 2
The localization of all 65,536 8-mers in Drosophila and human promoters. The clustering factors (CF or CF+) calculated for 20 bp bins plotted at the position of the most populated bin for all 65,536 8-mers. (a) CF for 10,914 Drosophila promoters; (b) CF for 15,011 human (UCSC) promoters; (c) CF for 12,926 human (DBTSS) promoters; (d) CF+ for 10,914 Drosophila promoters; (e) CF+ for 15,011 human (UCSC) promoters; (f) CF+ for 12,926 human (DBTSS) promoters.
Figure 3
Figure 3
Scatter plots showing the strand dependence of 8-mer localization, and the comparison of localization between different organisms (Drosophila and human). The clustering factors for all 8-mers, calculated for 20 bp bins, are plotted on the positive (CF+) versus the negative (CF-) strand for (a) Drosophila, (b) human (UCSC), and (c) human (DBTSS) promoters. The 256 palindromic sequences have equivalent CF+/CF- values but are plotted with a CF- value of -1. Comparison of CF values of 8-mers for (d) human (UCSC) versus Drosophila, (e) human (DBTSS) versus Drosophila, and (f) human (UCSC) versus human (DBTSS). Common elements should lie along the diagonal.
Figure 4
Figure 4
8-mer localization in Drosophila expressed as a probability term, and characteristics of the most statistically relevant 8-mers. (a) The probability term P = -log10(1 - p) for the 13,552 8-mers with a maximum bin containing ≥15 members. The 298 DNA sequences above the line at P = 16, a 1 in 1 × 1016 (single sampling) chance of being random, were analyzed in more detail. (b) Clustering factors for both the positive (CF+) and negative strand (CF-) were plotted for the 298 most significant peaking 8-mers. The distribution falls into two distinct groupings; those that display a symmetric distribution on both strands (red circles) and those that cluster on only one strand (black circles). (c) A histogram showing the number of promoters containing each of the 15 motifs, grouped into three classes, DMp1 to 5, DMv1 to 5, and NDM1 to 5. We also present the common name and the consensus sequence.
Figure 5
Figure 5
The 15 DNA motifs derived from grouping 298 octamers whose probability of having a non-random distribution was less than 1 × 10-16. The table is grouped into two panels. (a) presents the 10 directional motifs, while (b) shows the five non-directional motifs. We present: the sequence logo; the consensus sequence using IUPAC letters to represent degenerate bases - R (G, A), W (A, T), Y (T, C), K (G, T), M(A, C), S (G, C), N (A, T, G, C); the name assigned in this work; the common name if it exists; designations from previous work [10]; the number of 8-mers that peaked that were placed in the family; peak location as base-pairs relative to the TSS; clustering factor (CF+) on the positive strand; clustering factor (CF-) on the negative strand; the bins that were pooled to define the peak; and the unique genes in the peak.
Figure 6
Figure 6
The distribution of the 15 identified motifs in Drosophila promoters. (a-o) The number of occurrences of each motif, in each 20 bp bin, for the positive strand (solid red) and the negative strand (dashed black). The inserts show the same data plotted at a single nucleotide resolution from -100 bp to +100 bp relative to the TSS. Inserts for the directional motifs (DMp1 to 5 and DMv1 to 5) show the distribution on the positive strand only, while those for the non-directional motifs (NDM1 to 5) show the distribution for both strands. (a-e) The directional motifs that have a precise localization (DMp); (f-j) the directional motifs with a variable localization (DMv); (k-o) the non-directional motifs that all have a variable localization (NDM).
Figure 7
Figure 7
The localization, on the positive strand, of all 4,096 6-mers in Drosophila and human promoters. Clustering factor (CF+) for the positive strand, plotted at a single base-pair resolution, at the position of the most populated bp, for all 4,096 6-mers. (a) CF+ from 10,914 Drosophila promoters; (b) CF+ from 15,011 human (UCSC); (c) CF+ from 12,926 human (DBTSS) promoters; (d) the exact placement of Drosophila TATA, INR variants, and DPE variants relative to each other. The sequence is broken into 10 bp segments.
Figure 8
Figure 8
The distribution of 15 'Drosophila specific' motifs in Drosophila and human promoters. (a-o) The number of occurrences of each of the 15 identified Drosophila motifs in each 20 bp bin for Drosophila (dotted black), human (UCSC; solid red) and human (DBTSS; dashed blue) promoters. For the ten directional motifs, only the occurrences on the positive strand are represented. For the five non-directional elements, the occurrences on both the positive and negative strand are represented. (x) The distributions of the INR motif (TGACTY), from -100 to +100, for both Drosophila and human promoters at a single base-pair resolution. The number of occurrences of each element has been normalized, based on a dataset of 10,000 promoters, to compensate for the different sizes of the datasets.
Figure 9
Figure 9
The distribution of 8 'human specific' motifs in Drosophila and human promoters. (a-h) The number of occurrences of each previously identified [11] human specific motif in each 20 bp bin for Drosophila (dotted black), human (UCSC; solid red) and human (DBTSS; dashed blue) promoters. The number of occurrences of each element has been normalized, based on a dataset of 10,000 promoters, to compensate for the different sizes of the datasets.
Figure 10
Figure 10
E-box variants that peak in Drosophila and human promoters. (a-d) The number of occurrences of (a) CACGTG,(b) CAGCTG, (c) RCACGTGY and (d) YCACGTGR in each 20 bp bin for Drosophila (dotted black), human (UCSC; solid red), and human (DBTSS; dashed blue) promoters.
Figure 11
Figure 11
Correlations between DNA motifs in promoters and function (GO terms and mRNA expression properties). In both sections of the figure, promoter lists in blue are DMp, green are DMv, and red are NDM. Control groups with the DNA motifs not in the peak but between -1,000 bp and +499 bp are in black with an asterisk.(a) False-color image of representation bias in GO terms and mRNA expression clusters for the 15 DNA motifs, either in the peak or elsewhere in the promoter region. Values plotted are -log10(p value) calculated by Fisher's exact test. Data for the 54 most strongly correlated GO terms are shown (some redundant GO terms are removed). On the far left are results for over/under representation in self-organizing map (SOM) clusters identified from previously published expression data [20]. Over-represented categories are colored in red and under-represented categories are in blue. N values displayed at the top are total numbers of genes in the reference set assigned to that group. (b) False-color image of hierarchically clustered median percentile ranks of mRNA expression ratios, for previously published data for embryo and adult samples [21]. Each ratio represents expression relative to a global mean across arrays. Columns represent each of 89 array experiments, clustered so that embryo samples are at left and adult samples are at right. 'All Promoters' represents all genes and shows no preferences (median percentile rank = 50).
Figure 12
Figure 12
Correlations between five INR variants localized exactly at the TSS in promoters and function (GO terms and mRNA expression properties). (a) False-color image of representation bias in GO terms and mRNA expression clusters for the five variants of the INR motif in the peak. Values are calculated and displayed as in Figure 11a. The 42 most strongly correlated GO terms are shown. Note that each INR variant correlates with different GO terms. (b) False-color image of hierarchically clustered median percentile ranks of mRNA expression ratios, for previously published data for embryo and adult samples 21. Data are calculated and displayed as in Figure 1

Similar articles

Cited by

References

    1. Smale ST, Kadonaga JT. The RNA polymerase II core promoter. Annu Rev Biochem. 2003;72:449–479. doi: 10.1146/annurev.biochem.72.121801.161520. - DOI - PubMed
    1. Margolis P, Driks A, Losick R. Differentiation and the establishment of cell type during sporulation in Bacillus subtilis. Curr Opin Genet Dev. 1991;1:330–335. doi: 10.1016/S0959-437X(05)80296-5. - DOI - PubMed
    1. Hiller M, Chen X, Pringle MJ, Suchorolski M, Sancak Y, Viswanathan S, Bolival B, Lin TY, Marino S, Fuller MT. Testis-specific TAF homologs collaborate to control a tissue-specific transcription program. Development. 2004;131:5297–5308. doi: 10.1242/dev.01314. - DOI - PubMed
    1. Kai T, Williams D, Spradling AC. The expression profile of purified Drosophila germline stem cells. Dev Biol. 2005;283:486–502. doi: 10.1016/j.ydbio.2005.04.018. - DOI - PubMed
    1. Hochheimer A, Tjian R. Diversified transcription initiation complexes expand promoter selectivity and tissue-specific gene expression. Genes Dev. 2003;17:1309–1320. doi: 10.1101/gad.1099903. - DOI - PubMed

Publication types

LinkOut - more resources