Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jan;18(1):1-15.
doi: 10.1261/rna.029249.111. Epub 2011 Nov 29.

Evidence for Widespread Association of Mammalian Splicing and Conserved Long-Range RNA Structures

Affiliations
Free PMC article

Evidence for Widespread Association of Mammalian Splicing and Conserved Long-Range RNA Structures

Dmitri D Pervouchine et al. RNA. .
Free PMC article

Abstract

Pre-mRNA structure impacts many cellular processes, including splicing in genes associated with disease. The contemporary paradigm of RNA structure prediction is biased toward secondary structures that occur within short ranges of pre-mRNA, although long-range base-pairings are known to be at least as important. Recently, we developed an efficient method for detecting conserved RNA structures on the genome-wide scale, one that does not require multiple sequence alignments and works equally well for the detection of local and long-range base-pairings. Using an enhanced method that detects base-pairings at all possible combinations of splice sites within each gene, we now report RNA structures that could be involved in the regulation of splicing in mammals. Statistically, we demonstrate strong association between the occurrence of conserved RNA structures and alternative splicing, where local RNA structures are generally more frequent at alternative donor splice sites, while long-range structures are more associated with weak alternative acceptor splice sites. As an example, we validated the RNA structure in the human SF1 gene using minigenes in the HEK293 cell line. Point mutations that disrupted the base-pairing of two complementary boxes between exons 9 and 10 of this gene altered the splicing pattern, while the compensatory mutations that reestablished the base-pairing reverted splicing to that of the wild-type. There is statistical evidence for a Dscam-like class of mammalian genes, in which mutually exclusive RNA structures control mutually exclusive alternative splicing. In sum, we propose that long-range base-pairings carry an important, yet unconsidered part of the splicing code, and that, even by modest estimates, there must be thousands of such potentially regulatory structures conserved throughout the evolutionary history of mammals.

Figures

FIGURE 1.
FIGURE 1.
(A) Sequence windows surrounding donor and acceptor splice sites, le nucleotides within exon, and li nucleotides within intron. (B) Arrangements of complementary boxes. Each of the two complementary boxes (5′-box and 3′-box) can be located either at donor or at acceptor splice site. If both 5′- and 3′-box (filled circles) are located at the same splice site (different splice sites), the corresponding structure is referred to as cis-structure (trans-structure, respectively). There is no limit on the distance between boxes. Complementary boxes are denoted by dotted arcs. The two-letter code denotes the location of boxes so that, for instance, DA stands for Donor-Acceptor location of 5′- and 3′-boxes (in this order).
FIGURE 2.
FIGURE 2.
Classes of annotated splicing events associated with RNA structures in DA arrangement. The observed proportions are shown relative to the total number of predicted structures corresponding to RefSeq-confirmed splicing events. The expected percentages were computed based on the population proportions for a random sample of the same size. Error bars denote standard errors for proportions (Samuels and Witmer 2003). Types of splicing events (not mutually exclusive) are: alternative (Alt; see Materials and Methods for definition), alternative acceptor site (Acc), alternative donor site (Don), intron-containing internal polyadenylation site (PolyA), intron-containing alternative transcription initiation site (Tx init), intron containing one or multiple cassette exons (Cas Exn).
FIGURE 3.
FIGURE 3.
Distributions of splice sites strengths (see Materials and Methods) of donor (left three box plots) and acceptor (right three box plots) splice sites associated with RNA structures (Boxes) compared to the corresponding distributions of strengths of all (All) and alternative (Alternative) splice sites. Acceptor, but not donor splice sites, associated with RNA structures are (on average) weaker compared to alternative splice sites.
FIGURE 4.
FIGURE 4.
(A) Splicing schema of the human SF1 gene (Splicing Factor 1). Cylinders denote exons, which are enumerated in the 5′-to-3′ direction; filled circles denote boxes; dotted arcs denote complementarity between boxes; solid lines denote splicing; the arrow denotes an alternative transcription start. Box C is complementary to box F (P-value ≅ 10−17), which is also complementary to box E (P-value ≅ 10−19). Box G is complementary to both box H (P-value ≅ 10−12) and box I (P-value ≅ 10−14). (BE) Multiple sequence alignments describing box E-box F pairing (B), box C-box F pairing (C), box G-box I pairing (D), and box G-box H pairing (E). Complementary nucleotides are highlighted. Framed capital nucleotides denote exons. Asterisks denote conserved positions.
FIGURE 5.
FIGURE 5.
Splicing to alternative acceptor site in the SF1 minigene is regulated by the stem structure formed by the conserved box sequences. (A) Schematic representation of the SF1 minigene, which contains the chromosomal region chr11:64,535,223-64,535,752 (UCSC Genome Browser) from exon 9 to exon 10 of the SF1 gene. Alternative acceptor sites are shown by vertical arrows; locations of primers used for amplification of minigene are indicated by horizontal arrows. (B) Secondary structure formed by the conserved boxes affects acceptor site usage. mRNA products expressed from wild-type (wt) minigene and minigenes mutated within conserved boxes E, F, or both (E/F) were reverse-transcribed and analyzed in 2% agarose gel. (M) size markers (100 bp DNA ladder), (C) control (PCR in the absence of template). The addition (+) or absence (−) of the reverse transcriptase (RT) enzyme to the reaction is indicated. The positions of unspliced (578-bp) and spliced (317- bp and 296-bp) products are shown on the right. (C) Predicted base-pairing for the wild type, box E, box F, and box E/F mutants (point mutations are shown in boldface), with the estimated equilibrium free energies. The box E sequence is shown above the box F sequence. (D) Comparing the nucleotide sequences of alternatively spliced products of minigenes: wild type and box E/F mutant (upper) and box E and box F mutant (lower).
FIGURE 6.
FIGURE 6.
(A) Splicing schema of the gene Slc39a7 (also known as Ke4, ZIP7), which encodes a transporter involved in zinc homeostasis (Huang et al. 2005). Box A is complementary to box B (DD arrangement, P-value ≅ 10−18). Box B is complementary to box C (DA arrangement, P-value ≅ 10−13). The 3′-most donor splice site of exon 2 and the 5′-most acceptor splice site of exon 3 are either both used or both not used, as shown by the lines. (B,C) The rest of the legend is the same as in Figure 4.
FIGURE 7.
FIGURE 7.
(A) Splicing schema of the gene ZFX exemplifies the AD-arrangement of boxes. Exon 10 is surrounded by complementary boxes, box A and box B (P-value ≅ 10−13). (B) As in Figure 4.
FIGURE 8.
FIGURE 8.
Structures in AA arrangement. (A) Trans-AA structure in the HNRNPK gene (P-value ≅ 10−9). (C) Cis-AA structure in the ZNF384 gene. The structure formed by box A and box B (P-value ≅ 10−7) is masking the polypyrimidine tract preceding exon 10. (B,D) As in Figure 4.
FIGURE 9.
FIGURE 9.
Structures in DD arrangement. (A) Trans-DD structure in SRSF7 gene (P-value ≅ 10−21). (C) Cis-DD structure in PRPF39 gene (P-value ≅ 10−11). Box A overlaps with the cryptic donor splice site pointed to by the arrow in the multiple sequence alignment shown in D. The consensus score of the cryptic site sequence GUAAGC is higher than that of the endogenous donor site (GUGCGU). (B,D) As in Figure 4.

Similar articles

See all similar articles

Cited by 23 articles

See all "Cited by" articles

Publication types

Substances

LinkOut - more resources

Feedback