Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jul;25(7):1008-17.
doi: 10.1101/gr.188193.114. Epub 2015 May 12.

Core promoter sequence in yeast is a major determinant of expression level

Affiliations

Core promoter sequence in yeast is a major determinant of expression level

Shai Lubliner et al. Genome Res. 2015 Jul.

Abstract

The core promoter is the regulatory sequence to which RNA polymerase is recruited and where it acts to initiate transcription. Here, we present the first comprehensive study of yeast core promoters, providing massively parallel measurements of core promoter activity and of TSS locations and relative usage for thousands of native and designed sequences. We found core promoter activity to be highly correlated to the activity of the entire promoter and that sequence variation in different core promoter regions substantially tunes its activity in a predictable way. We also show that location, orientation, and flanking bases critically affect TATA element function, that transcription initiation in highly active core promoters is focused within a narrow region, that poly(dA:dT) orientation has a functional consequence at the 3' end of promoters, and that orthologous core promoters across yeast species have conserved activities. Our results demonstrate the importance of core promoters in the quantitative study of gene regulation.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Illustration of our experimental system. Oligonucleotides from a library comprising 13,000 designed synthetic sequences (Agilent Technologies) containing a 118-bp-long variable core promoter sequence were ligated into a low copy plasmid (top). Designed sequences included 7536 unique core promoter sequences, and for 5464 of them, we designed a second instance that was additionally barcoded by synonymous mutations within the first 36 bp of the YFP. The barcoded sequences also differed from the nonbarcoded ones by four mismatches within the 10 bp upstream of the YFP. The plasmid pool was transformed into yeast to create a heterogeneous pool of yeast cells, each cell expressing YFP at a different level (middle). To measure expression, cells were sorted using fluorescence activated sorting (FACS) into 16 expression bins by their YFP/mCherry ratio, and the core promoter sequences were amplified using bin-specific barcoded primers and sent to parallel sequencing (left pipeline). Sequencing reads coming from YFP-barcoded instances were removed. Each read was then mapped to a YFP/mCherry bin and a core promoter sequence. This gave for each core promoter sequence the binned distribution of YFP/mCherry levels over the cells that had that sequence (bottom left), from which we computed the mean YFP/mCherry (see Supplemental Note). To map TSSs, we extracted total RNA from the pool of yeast cells, performed 5′ RACE using primers specific to the YFP sequence, and sequenced the products (right pipeline). Sequencing reads not mapping to YFP-barcoded instances were removed. Each read was then mapped to a core promoter sequence by its YFP barcode, enabling us to compute the transcription initiation landscape of YFP-barcoded core promoter sequences (bottom right) (see Supplemental Note).
Figure 2.
Figure 2.
Core promoter activity is highly correlated to the activity of the entire promoter. (A) A comparison of our core promoter activity measurements (x-axis) to previously measured promoter activities (y-axis) for 238 Saccharomyces sensu stricto RP promoters (Zeevi et al. 2014) reveals a high correlation between the two measures. Dark red dots mark promoters that are not regulated by Rap1. (B) Similar to A for 133 constitutively expressed S. cerevisiae genes (Keren et al. 2013).
Figure 3.
Figure 3.
Sequence variation in different core promoter regions substantially affects activity. Results of learning linear models that predict the effects of mutating various native core promoter regions (see main text) on core promoter activity, based on sequence features that measure differences between mutant and native core promoters. We used a K-fold cross-validation scheme, such that each mutant appeared once in a held-out test set, and K − 1 times in a training set. (A) Results for PIC region mutations. For each mutated core promoter, we plotted its measured percent change (compared to the native core promoter) in core promoter activity (x-axis) against its predicted one (y-axis, predicted by the linear model learned when that mutant was part of the held-out test set). Gray lines mark the axes’ zero values. We also report mean performance measures (r and R2 statistics) of the models over the test sets. (B) Same as A for scanning region mutations. (C) Same as A for initiation region mutations. (D) Same as A for sliding window mutations. (E) An illustration summarizing classes of sequence features included in our linear models and their predicted effect on core promoter activity. All learned features are specified in Supplemental Figures 1–4. The golden right arrow marks the translation start site.
Figure 4.
Figure 4.
TATA element location, orientation, sequence, and flanking bases affect its functionality. (A) We inserted the TATA consensus 8-mer TATAAAAA into different positions along the PDC1 background sequence (see main text). Each row in the left panel heatmap corresponds to one insertion case, with TATA start position marked in dark blue (in a few cases, the insertion actually resulted in two overlapping TATA 8-mers, and then both start positions are marked), and the measured TSS distribution appearing in red and yellow (see color bar on the left). The effect of every insertion on core promoter activity is shown by the corresponding bar within the right panel. The light blue dashed lines separate the results of three regions: [−118,−96], [−95,−69], and the five insertion positions further downstream. Note that there are a few instances with missing TSS or expression data. (B) For the same data shown in A, an illustration of the percent initiation events (y-axis) at position −30 (red dots) vs. positions [−29,−1] (blue dots), as a function of the TATAAAAA 8-mer insertion start position (x-axis). The trend lines of corresponding colors show the moving average using a sliding window of length 10. (C) Box plots of the percent changes to core promoter activity caused by knockout mutations of native TATA elements having the consensus 6-mer TATAWA (W = A/T), in three core promoter windows: [−118,−99], [−98,−69], and [−68,−1]. Assignment to windows was based on the TATAWA start position. (D) Same as in C for inversions of native TATAAANN (N = A/C/G/T) 8-mers. (E) For insertions of the TATAAAAA and TATATATA consensus TATA 8-mers into position −98 of the ENO2 and PDC1 backgrounds, we also generated instances in which we additionally randomized their flanking sequences of lengths 2, 5, or 10 bp. For each such instance, we plotted the percent change to core promoter activity (y-axis). Pink, red, and dark red dots mark cases with random flanks of 2 bp, 5 bp, and 10 bp, respectively. Dashed light blue lines mark the value measured for insertions without flanking sequence randomization. (F) Box plots of the percent changes to core promoter-induced expression caused by insertion (into position −98) of either consensus TATAWAWR (W = A/T, R = A/G) 8-mers or TATA 8-mers that are one mismatch away from a consensus 8-mer. (Top) Insertions into the PDC1 background; (bottom) insertions into the ENO2 background.
Figure 5.
Figure 5.
Highly expressed core promoters tend to have focused transcription initiation. A comparison of our core promoter activity measurements (x-axis) to our measure of focused transcription initiation (y-axis) for 252 native core promoters (taken from the union of the two sets in Fig. 2) reveals a significant correlation between the two measures. Based on the center values ([min. + max.]/2) of the two measures, native core promoters were classified as having either high or low activity (x-axis classification) and as having either focused or dispersed transcription initiation (y-axis classification). Core promoters with high activity were found to be enriched for focused transcription initiation.
Figure 6.
Figure 6.
Effect of poly(dA)/poly(dT) inversions in different parts of the core promoter. (A) Box plots of the percent changes to core promoter activity caused by inversions of native poly(dA) homopolymers in four core promoter windows: [−118,−91], [−90,−61], [−60,−31], and [−30,−1]. Assignment to windows was based on the poly(dA) start position. (B) Same as in A for inversions of native poly(dT) homopolymers.
Figure 7.
Figure 7.
Core promoter activity is conserved between orthologous Saccharomyces sensu stricto RPs. A comparison of the core promoter activity of native RP core promoters from S. cerevisiae and their orthologous counterparts from each of S. paradoxus, S. mikatae, and S. bayanus reveals very high correlations.

Similar articles

Cited by

References

    1. Aow JSZ, Xue X, Run J-Q, Lim GFS, Goh WS, Clarke ND. 2013. Differential binding of the related transcription factors Pho4 and Cbf1 can tune the sensitivity of promoters to different levels of an induction signal. Nucleic Acids Res 41: 4877–4887. - PMC - PubMed
    1. Basehoar AD, Zanton SJ, Pugh BF. 2004. Identification and distinct regulation of yeast TATA box-containing genes. Cell 116: 699–709. - PubMed
    1. Bushnell DA, Westover KD, Davis RE, Kornberg RD. 2004. Structural basis of transcription: an RNA polymerase II-TFIIB cocrystal at 4.5 Angstroms. Science 303: 983–988. - PubMed
    1. Chen W, Struhl K. 1988. Saturation mutagenesis of a yeast his3 “TATA element”: genetic evidence for a specific TATA-binding protein. Proc Natl Acad Sci 85: 2691–2695. - PMC - PubMed
    1. Dvir S, Velten L, Sharon E, Zeevi D, Carey LB, Weinberger A, Segal E. 2013. Deciphering the rules by which 5′-UTR sequences affect protein expression in yeast. Proc Natl Acad Sci 110: E2792–E2801. - PMC - PubMed

Publication types