Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Nov 24;6(11):e1001016.
doi: 10.1371/journal.pcbi.1001016.

Genome-wide association between branch point properties and alternative splicing

Affiliations

Genome-wide association between branch point properties and alternative splicing

André Corvelo et al. PLoS Comput Biol. .

Abstract

The branch point (BP) is one of the three obligatory signals required for pre-mRNA splicing. In mammals, the degeneracy of the motif combined with the lack of a large set of experimentally verified BPs complicates the task of modeling it in silico, and therefore of predicting the location of natural BPs. Consequently, BPs have been disregarded in a considerable fraction of the genome-wide studies on the regulation of splicing in mammals. We present a new computational approach for mammalian BP prediction. Using sequence conservation and positional bias we obtained a set of motifs with good agreement with U2 snRNA binding stability. Using a Support Vector Machine algorithm, we created a model complemented with polypyrimidine tract features, which considerably improves the prediction accuracy over previously published methods. Applying our algorithm to human introns, we show that BP position is highly dependent on the presence of AG dinucleotides in the 3' end of introns, with distance to the 3' splice site and BP strength strongly correlating with alternative splicing. Furthermore, experimental BP mapping for five exons preceded by long AG-dinucleotide exclusion zones revealed that, for a given intron, more than one BP can be chosen throughout the course of splicing. Finally, the comparison between exons of different evolutionary ages and pseudo exons suggests a key role of the BP in the pathway of exon creation in human. Our computational and experimental analyses suggest that BP recognition is more flexible than previously assumed, and it appears highly dependent on the presence of downstream polypyrimidine tracts. The reported association between BP features and the splicing outcome suggests that this, so far disregarded but yet crucial, element buries information that can complement current acceptor site models.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. AGEZ definition and BP search region.
BP location relative to the 3SS (a1) is dependent on presence/absence additional AG dinucleotides in the intron. The most common situation is the absence of AGs in the region between the BP and the 3SS. However, these can occur either at locations close to the 3SS (i.e. a2 in r1) where they may compete with the 3SS signal, or very close to the BP (i.e a3 in r3), where they are bypassed possibly due to steric constraints. Any AG occurring in r2 is likely to be recognized as 3SS. Therefore, BPs are usually located inside region defined as r1+r2+r3 – the AG exclusion zone (AGEZ).
Figure 2
Figure 2. Building a set of conserved putative BPs.
A – Distribution of mammalian wide conserved TNA instances in the last 300nt of human introns. The blue line represents the mean frequency. The dashed red lines represent the mean+- the standard deviation. The grey area represents the region comprehended between 55 to 15 nt upstream the 3SS. B,C and D – Distribution over the last 300 nts in human introns for the mammalian-conserved instances of 3 example pentamers belonging to different categories: No association with any positionally biased signal (B), PPT-associated (C) and BP-associated (D). The line in blue represents the distribution of all (conserved and non-conserved) instances. The grey area represents the region comprehended between 55 to 15 nt upstream the 3SS. E – Scheme representing the employed strategy to build a set of conserved putative BPs. We selected TNA conserved instances located between 55 and 15 nts upstream the 3SS and unique to the last 300nt of the intron, if overlapped by at least one BP-associated pentamer in all species considered (see Methods).
Figure 3
Figure 3. BP signal characterization.
A – Information content per motif position. Both position 4 and 6 were previously fixed. B – Sequence logo for the consTNA-BP5 set. The height of each letter represents the frequency of that nucleotide in the respective position. C – Mutual information between BP signal positions. Blank spaces represent the two invariable positions 4 and 6.
Figure 4
Figure 4. Sequence counts correlate with U2 binding energy.
Barplot showing, for each nonamer cluster, the U2 binding energy (blue), number of occurrences in the consTNA and consTNA-BP5 sets (grey and green, respectively). The fraction of eliminated cases by the use of the 124 BP-associated pentamers is also shown in orange. Nonamer clusters were grouped by core pentamer (5 central positions).
Figure 5
Figure 5. Benchmarking on a set of experimentally verified BPs.
A – Ranking of experimentally verified BPs according to 4 predictive methods. Blank cells represent BPs that either did not match the initial sequence requirements or that are located outside the search region. Though Schwartz method in several introns ranks the BP as 1st, the prediction is discarded because it is not the candidate closest to the 3SS (white asterisks). B – Correct predictions overlap between methods. C – Percentage of introns in which each method was capable of correctly predicting the BP. The error bars represent the standard error given by the formula: formula image where formula image is the probability and n the overall sample size.
Figure 6
Figure 6. Predicted human branch points.
A – Histogram representing the distribution of BS positions relative to the AGEZ-defining AG-dinucleotide (a3 in Figure 1). Grey region represents positions that are biased by the presence of the AG dinucleotide. The dashed red line represents the leftmost point where the distribution is different from an expected uniform distribution. The AG dinucleotide exact position is shown on the x-axis. For this plot, top scoring candidates over the last 500nt were considered in order to obtain the left background tail. For visualization purposes only positions from −30 to +30 nts relative to the AG are shown. B – Pie chart showing the number of introns in the initial dataset (N = 183187) for which no predictions were obtained (None), no predictions falling inside the 1st AGEZ were obtained (None in AGEZ), the top prediction inside 1st AGEZ has a negative SVM score (Negative scoring) and the top prediction inside the 1st AGEZ scores positively (Positive scoring). C – Histogram showing the distribution of predicted BS distances relative to the 3SS. Only top scoring candidates inside the AGEZ were considered.
Figure 7
Figure 7. BP sequence, position, intron length and exon skipping.
Percentage of exons for which (A) there is skipping evidence and (B) average exon EST inclusion level depending on BP distance. These values were computed using a sliding window of length 20 and step 10. C – Percentage of exons for which there is skipping evidence depending on BP sequence score. This was computed using a sliding window of length 1 and step 0.25. D – Mean BP sequence score as a function of intron length. This was computed in bins of 100 nts. The error bars represent the standard error. In A and C, the standard error is given by the formula: formula image where formula image is the probability and formula image the overall sample size.
Figure 8
Figure 8. BP features and exon age.
A – Mean SVM score for BPs preceding pseudo exons, primate specific exons and mammalian-conserved exons. B – BP features for the same three exon groups: sequence score (top left), pyrimidine content between BP and 3SS (top right), downstream PPT score (bottom left) and distance to the downstream PPT (bottom right). C – BP-3SS distance for exons in the three above mentioned categories.

Similar articles

Cited by

References

    1. Nilsen TW. The spliceosome: the most complex macromolecular machine in the cell? Bioessays. 2003;25:1147–1149. - PubMed
    1. Jurica MS, Moore MJ. Pre-mRNA splicing: awash in a sea of proteins. Mol Cell. 2003;12:5–14. - PubMed
    1. Licatalosi DD, Mele A, Fak JJ, Ule J, Kayikci M, et al. HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature. 2008;456:464–469. - PMC - PubMed
    1. Bourgeois CF, Lejeune F, Stevenin J. Broad specificity of SR (serine/arginine) proteins in the regulation of alternative splicing of pre-messenger RNA. Prog Nucleic Acid Res Mol Biol. 2004;78:37–88. - PubMed
    1. Black DL. Mechanisms of alternative pre-messenger RNA splicing. Annu Rev Biochem. 2003;72:291–336. - PubMed

Publication types