Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 11 (4), e1005166
eCollection

Systematic Profiling of poly(A)+ Transcripts Modulated by Core 3' End Processing and Splicing Factors Reveals Regulatory Rules of Alternative Cleavage and Polyadenylation

Affiliations

Systematic Profiling of poly(A)+ Transcripts Modulated by Core 3' End Processing and Splicing Factors Reveals Regulatory Rules of Alternative Cleavage and Polyadenylation

Wencheng Li et al. PLoS Genet.

Abstract

Alternative cleavage and polyadenylation (APA) results in mRNA isoforms containing different 3' untranslated regions (3'UTRs) and/or coding sequences. How core cleavage/polyadenylation (C/P) factors regulate APA is not well understood. Using siRNA knockdown coupled with deep sequencing, we found that several C/P factors can play significant roles in 3'UTR-APA. Whereas Pcf11 and Fip1 enhance usage of proximal poly(A) sites (pAs), CFI-25/68, PABPN1 and PABPC1 promote usage of distal pAs. Strong cis element biases were found for pAs regulated by CFI-25/68 or Fip1, and the distance between pAs plays an important role in APA regulation. In addition, intronic pAs are substantially regulated by splicing factors, with U1 mostly inhibiting C/P events in introns near the 5' end of gene and U2 suppressing those in introns with features for efficient splicing. Furthermore, PABPN1 inhibits expression of transcripts with pAs near the transcription start site (TSS), a property possibly related to its role in RNA degradation. Finally, we found that groups of APA events regulated by C/P factors are also modulated in cell differentiation and development with distinct trends. Together, our results support an APA code where an APA event in a given cellular context is regulated by a number of parameters, including relative location to the TSS, splicing context, distance between competing pAs, surrounding cis elements and concentrations of core C/P factors.

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Systematic analysis of APA events modulated by C/P and splicing factors.
(A) Experimental design. Proliferating C2C12 myoblast cells were transfected with siRNAs against 23 C/P factors and 3 splicing factors for 48 hr, followed by APA analysis using 3’ region extraction and deep sequencing (3’READS). Sequences of the siRNAs used for knockdown or control are listed in S1 Table. For direct inhibition of U1 snRNP (U1) activity, C2C12 cells were transfected with U1D or mutant U1D (mU1D) oligonucleotides (oligos, shown in (B)) for 8 hr or 24 hr, followed by 3’READS analysis. (B) Oligos used to study U1 functions. Top, the 5’ end region of U1 snRNA and a consensus sequence surrounding 5’ splice site (5’SS, -3 nt to +6 nt, based on all annotated 5’SS in the mouse genome), are shown to illustrate the base-pairing between U1 snRNA and 5’SS sequence; Bottom, U1D or mutant U1D (mU1D) sequences. Locked nucleic acids are in upper case and 2'-O-methyl nucleotide in lower case. Nucleotides shown in red correspond to the -3 to +6 region surrounding the 5’SS. mU1D differs from U1D in two nucleotides as indicated. (C) Gene expression of different factors after 48 hr of siRNA knockdown (KD), as measured by reverse transcription-quantitative PCR (RT-qPCR) or Western Blot (WB). Error bars for RT-qPCR are standard deviation based on two technical replicates. Some KD samples were not analyzed by WB, as indicated. See S1 Fig for WB images.
Fig 2
Fig 2. 3’UTR-APA.
(A) Schematic of 3’UTR-APA that results in alternative 3’UTR isoforms. The region between two pAs, pA1 and pA2, is called alternative UTR (aUTR). AAAn, poly(A) tail. CDS and 3’UTR are indicated. Regulation of APA is represented by RED (relative expression difference), whose formula is shown in the figure. #pA1 and #pA2 are numbers of poly(A) site-supporting (PASS) reads for pA1 and pA2, respectively. Test and Ctrl are test and control samples, respectively. (B) RED values for different samples. The top two most abundant APA isoforms based on the number of PASS reads of each gene were analyzed. Only genes with > = 20 PASS reads for proximal and distal pAs combined were used. The median RED value of each sample is shown as a thick black line and interquartile range (between the 25th and 75th percentiles) is indicated by a gray box. (C) An example of 3’UTR-APA regulation. Left, APA isoforms of Timp2. The gene structure and the zoomed-in 3’-most exon are shown on the top. Two pAs are indicated, and their 3’UTR lengths and RPM values in proliferating and differentiated C2C12 cells are shown. Sequence conservation of the shown region (based on mammals) is indicated, with the height of line reflecting the degree of conservation. RT-qPCR amplicons to study APA regulation are indicated. Middle, regulation of the two 3’UTR-APA isoforms of Timp2 in several samples as indicated. RED values (distal pA vs. proximal pA, knockdown (KD) vs. siCtrl or differentiated cells vs. proliferating cells), q-values (Significance Analysis of Alternative Polyadenylation, SAAP, see Materials and Methods for detail), and direction of 3’UTR change (shortened, sh; or lengthened, Le) are indicated. Right, RT-qPCR analysis of 3’UTR-APA isoforms of Timp2 in several samples. RT-qPCR ΔΔCycle threshold (Ct) values were calculated by comparing Ct difference between proximal and distal amplicons in test vs. ctrl samples. Test is a sample from knockdown or differentiated cells, and ctrl is a sample from siCtrl or proliferating cells. Note the samples used for RT-qPCR analysis were not the same as those used for 3’READS. Error bars are standard deviation based on two replicates. (D) Normalized number of genes with regulated 3’UTR-APA in each sample as examined by GAAP. Normalized number is based on (observed value—expected value), and zero is used if a value is negative. Red and blue bars represent genes with lengthened 3’UTRs (Le, distal pA isoform upregulated) and shortened 3’UTRs (Sh, proximal pA isoform upregulated), respectively. Q-value < 0.05 (SAAP) was used to select genes with significant 3’UTR-APA regulation. Only the top two most abundant APA isoforms (based on the number of PASS reads) of each gene were used for this analysis. Error bars are standard deviation based on 20 times of bootstrapping. Samples are sorted by the total number of genes with 3’UTR-APA changes. Data for C2C12 differentiation are shown at the bottom for comparison. Log2(Le/Sh) is log2(ratio) of the number of Le genes to the number of Sh genes. (E) Correlation between normalized no. of genes with 3’UTR-APA (Le+Sh in (D)) and percent of genes with >2 pAs regulated (see S3 Fig). C2C12 differentiation data and siPABPN1 data are shown in red and others in gray. R2 is based on linear regression of all dots. (F) Correlation between log2(Le/Sh) and median aUTR size change. Data from all samples shown in (D) except the siCFI-68 and siCF-25 samples are plotted, with C2C12 differentiation data and siPABPN1 data shown in red diamond and square, respectively, and others in gray dots. R2 is based on linear regression of all gray dots. Median aUTR size change was calculated by the median aUTR size of lengthened 3’UTRs minus that of shortened 3’UTRs. Each gene was calculated once.
Fig 3
Fig 3. CDS-APA.
(A) Schematic of CDS-APA. (B) Normalized number of genes with regulated CDS-APA as examined by GAAP. Red and blue bars represent genes with upregulated CDS-APA isoforms (UP) and downregulated isoforms (DN), respectively. All CDS-APA isoforms were combined and compared to all 3’-most exon isoforms combined by SAAP. Q-value < 0.05 (SAAP) was used to select genes with significant CDS-APA regulation. Error bars are standard deviation based on 20 times of bootstrapping. Samples are sorted by the total number of genes with CDS-APA changes. Data for C2C12 differentiation are shown at the bottom for comparison. Log2(UP/DN) is log2(ratio) of the number of UP genes to the number of DN genes. (C) Normalized expression changes of intronic pA isoforms in several samples. Introns were divided into first (+1), second (+2), last (-1), and second to last (-2), and middle (between +2 and -2 introns) groups. Only genes with ≥4 introns and only pA isoforms with ≥10 PASS reads in two comparing samples combined were analyzed. Expression changes are log2(ratio) of PASS reads in test sample vs. control sample. Values for five intron groups were normalized by mean-centering to reveal bias of intron location. Error bars are standard error of mean. (D) Features of introns containing pAs of upregulated isoforms, including intron size, and 5’ and 3’ splice site (SS) strengths. Numbers are significance score (SS), which was calculated by –log10(P)*S, where P was based on the Wilcoxon rank sum test comparing an intron set of interest with a background set, and S = 1 when the intron set of interest had a larger median value (intron size, 5’SS strength or 3’SS strength) than the background set or -1 otherwise. The background set was derived from introns that contained detectable pA isoform expression in control samples. Introns were divided into five groups based on location, as in (C). The SS data are colored according to the color scheme shown in the graph.
Fig 4
Fig 4. C/P events around the transcriptional start site (TSS).
(A) Schematic of C/P events around the TSS. uaRNA and spRNA are upstream antisense and sense proximal RNAs, respectively. (B) Distribution of uaRNA and spRNA pAs utilized in control C2C12 cells. In this study, we required that the pA of a uaRNA/spRNA was within 2 kb from the TSS, a uaRNA did not overlap with any known protein-coding genes, and the pA of an spRNA was not in the 3’-most exon or in a single-exon gene. (C) Nucleotide frequency profiles around pAs of uaRNAs (left) and spRNAs (middle) and around 3’-most pAs (right). Dotted lines indicate the 25% value. (D) Normalized number of genes with regulated uaRNA expression. Red and blue bars represent genes with upregulated (UP) and downregulated (DN) uaRNA expression, respectively. Log2(UP/DN) is log2(ratio) of the number of UP genes to the number of DN genes. All uaRNAs were combined and compared by SAAP with all sense strand transcripts whose pAs were beyond 2 kb from the TSS. Q-value < 0.05 (SAAP) was used to select genes with a significant uaRNA expression difference. Error bars are standard deviation based on 20 times of bootstrapping. Samples are sorted by the total number of genes with uaRNA expression change. (E) Normalized number of genes with regulated spRNA expression. Red and blue bars represent genes with upregulated (UP) and downregulated (DN) spRNA expression, respectively. All spRNAs were combined and compared to all sense strand transcripts whose pAs were beyond 2 kb from the TSS by SAAP. Q-value < 0.05 (SAAP) was used to select genes with significant spRNA expression difference. Error bars are standard error of mean based on 20 times of bootstrap sampling. Samples are sorted by the total number of genes with spRNA expression changes. Log2(UP/DN) is log2(ratio) of the number of UP genes to the number of DN genes. (F-H) Metagene plots of uaRNA and spRNA expression in siPABPN1 (F), U1D (8 hr and 24 hr, G), and siRrp44 + siRrp6 (H) samples. Expression is represented by reads per million (RPM, poly(A) site-supporting reads only) at pA positions.
Fig 5
Fig 5. Detailed analysis of five C/P factors.
(A) Schematic of the experimental design. Proliferating C2C12 cells were harvested 32 hr after knockdown (KD), and both total and nuclear RNAs were extracted for 3’READS analysis. (B) Western Blot analysis of protein expression after 32 hr of KD. The percent of expression in KD cells compared to siCtrl cells for each KD is indicated. (C) Normalized number of genes with regulated 3’UTR-APA in each sample as examined by GAAP. See Fig 2D for details of the plot. Both total and nuclear RNA data are shown. (D) Cluster analysis of 3’UTR-APA regulation by the five factors. RED scores using the top two most abundant APA isoforms (based on the number of poly(A) site-supporting reads in all samples) of each gene were used for this analysis. Only pA isoforms with read number ≥5 in all samples were used. A RED score is difference in relative expression of distal pA isoform vs. proximal pA isoform between KD and siCtrl cells, as illustrated in Fig 2A. RED scores are represented in a heatmap using the color scheme shown in the graph. Positive and negative RED scores indicate lengthened and shortened 3’UTRs, respectively. RED scores for APA events were set to 0 when q-value > 0.05 (SAAP). Pearson correlation was used as metric for hierarchical clustering. (E) Venn diagrams comparing genes with significant 3’UTR-APA regulations by the five factors using total RNA (left) or nuclear RNA (right) data. (F) Relationship between extent of 3’UTR-APA regulation and aUTR size. Genes were divided into five bins based on the aUTR size (distance between the pAs of top two most abundant APA isoforms). The aUTR size range for each bin is shown in the graph. The extent of 3’UTR-APA regulation is represented by average RED scores, based on the data shown in (D). Only genes with ≥20 PASS reads (proximal and distal pAs combined) in both KD and siCtrl samples were used for RED calculation. RED scores of genes in bin #1 were compared with those in bin #5 by the Wilcoxon rank sum test for each sample, and p-values are shown.
Fig 6
Fig 6. Cis elements associated with regulated pAs.
(A) Schematic showing the analysis method. As indicated, pAs were divided into proximal and distal pA groups, and pAs of regulated isoforms were compared only with other pAs in the same group. As such, proximal pAs were only compared with proximal pAs and so were distal pAs. pAs of upregulated and downregulated isoforms were analyzed separately. Only data of nuclear RNA samples were used for analysis, because they were expected to have less post-transcriptional effects than total RNA samples. (B) Number of 4-mers with significantly biased frequency of occurrence (P < 0.001, Fisher’s exact test) near regulated pAs. Regulated pAs were those with q-value < 0.05 (SAAP). Three regions around the pA were analyzed, including -100 to -41 nt, -40 to -1 nt and +1 to +100 nt. Data for all 4-mers and top 6-mers are shown in S5 Table and S6 Table, respectively. (C) Significant 4-mers enriched for or depleted from pAs regulated by siCFI-68. Only top five 4-mers for the regions with ≥5 significant 4-mers are shown. Numbers are significance score (SS), which was calculated by –log10(P)*S, where P was based on the Fisher’s exact test and S = 1 for enrichment and -1 for depletion. (D) As in (C), significant 4-mers enriched for or depleted from pAs regulated by siFip1. (E) Regulation of different types of pAs by siFip1, as shown by Cumulative Distribution Function (CDF) curves of RED scores. Genes were divided into four groups based on i) distance between proximal and distal pAs (<120 nt or ≥120 nt), and 2) whether or not there was AAUAAA within 100 nt downstream of the proximal pA. These groups are also illustrated in the graph. The number of genes and the median RED score of each group are shown in a table. The differences between groups are indicated by p-values (Kolmogorov–Smirnov test).
Fig 7
Fig 7. Comparison of APA events regulated by the five C/P factors with those regulated in C2C12 differentiation.
(A) APA regulation in C2C12 differentiation for genes that showed shortened 3’UTRs (Sh, blue line) or lengthened 3’UTRs (Le, red line) in different knockdown (KD) samples (total RNA only). APA regulation in C2C12 differentiation is represented by Cumulative Distribution Function (CDF) curves of RED scores. Only genes with q-value < 0.1 (SAAP) were used. The numbers of Sh genes and Le genes are shown in blue or red, respectively. The Δ value is difference between the median RED scores of Le and Sh genes. Median RED scores are also indicated by vertical dotted lines. P-values (Kolmogorov–Smirnov test) are shown to indicate RED score difference between Le and Sh genes in C2C12 differentiation. (B) Ten groups of genes based on 3’UTR-APA regulation by siCFI-68 and siPcf11 (q-value < 0.1). Group 9 contained genes whose 3’UTRs were regulated by siPABPN1, siPABPC1, or siFip1. Group 10 contains other genes whose 3’UTR-APA isoforms were detectable in C2C12 cells. Le, 3’UTR lengthened; Sh, 3’UTR shortened. Number of genes and median aUTR size are also shown. (C) Median RED scores for gene groups shown in (B) in C2C12 differentiation. Group 6 is not included because of a very small number of genes (5) in the group. (D) RED scores of group 3 genes were compared with those of genes in other groups in C2C12 differentiation (left), 3T3-L1 differentiation (middle), and embryonic development (15 day vs. 11 day, right). As in (A), difference between the median RED scores of two sets (Δ) and p-value (Kolmogorov–Smirnov test) comparing the two sets are also shown.
Fig 8
Fig 8. APA models.
(A) Regulation of C/P events near the transcription start site (TSS) and introns. Both U1 and PABPN1 inhibit polyA(+) transcript expression near the TSS. For U1, inhibition of sense strand RNA is more prominent than that of upstream antisense RNA (uaRNA) due to higher frequency of its binding sites, i.e., 5’ splice site or related sequences, in the sense strand. The mechanism is likely to be inhibition of polyadenylation. PABPN1 has the opposite trend, with suppression of uaRNA expression being more significant. This function of PABPN1 is likely to involve exosome-mediated RNA decay. U1 and U2 inhibit intronic C/P because C/P is in a kinetic competition with splicing. (B) Regulation of pA usage in the last intron and the 3’-most exon. CFI-25/68 promotes usage of distal pAs through binding to the UGUA element, leading to longer 3’UTRs and selection of downstream terminal exons. PABPN1 and PABPC1 also promote distal pA usage, whereas Pcf11 and Fip1 promote proximal pA usage. Note that if two pAs are close to each other and there is an AAUAAA motif downstream of the proximal pA, Fip1 helps select the distal pA (not indicated in the graph). aUTR size is indicated to highlight its importance in APA regulation.

Similar articles

See all similar articles

Cited by 72 articles

See all "Cited by" articles

References

    1. Zhao J, Hyman L, Moore C. Formation of mRNA 3' ends in eukaryotes: mechanism, regulation, and interrelationships with other steps in mRNA synthesis. Microbiol Mol Biol Rev. 1999;63(2):405–45. - PMC - PubMed
    1. Colgan DF, Manley JL. Mechanism and regulation of mRNA polyadenylation. Genes Dev. 1997;11(21):2755–66. - PubMed
    1. Tian B, Graber JH. Signals for pre-mRNA cleavage and polyadenylation. WIREs RNA. 2012, 3: 385–396. 10.1002/wrna.116 - DOI - PMC - PubMed
    1. Proudfoot NJ. Ending the message: poly(A) signals then and now. Genes Dev. 2011;25(17):1770–82. 10.1101/gad.17268411 - DOI - PMC - PubMed
    1. Kadonaga JT. Perspectives on the RNA polymerase II core promoter. Wiley interdisciplinary reviews Developmental biology. 2012;1(1):40–51. 10.1002/wdev.21 - DOI - PMC - PubMed

MeSH terms

Associated data

Feedback