CoSREM: a graph mining algorithm for the discovery of combinatorial splicing regulatory elements

BMC Bioinformatics. 2015 Sep 4:16:285. doi: 10.1186/s12859-015-0698-6.

Abstract

Background: Alternative splicing (AS) is a post-transcriptional regulatory mechanism for gene expression regulation. Splicing decisions are affected by the combinatorial behavior of different splicing factors that bind to multiple binding sites in exons and introns. These binding sites are called splicing regulatory elements (SREs). Here we develop CoSREM (Combinatorial SRE Miner), a graph mining algorithm to discover combinatorial SREs in human exons. Our model does not assume a fixed length of SREs and incorporates experimental evidence as well to increase accuracy. CoSREM is able to identify sets of SREs and is not limited to SRE pairs as are current approaches.

Results: We identified 37 SRE sets that include both enhancer and silencer elements. We show that our results intersect with previous results, including some that are experimental. We also show that the SRE set GGGAGG and GAGGAC identified by CoSREM may play a role in exon skipping events in several tumor samples. We applied CoSREM to RNA-Seq data for multiple tissues to identify combinatorial SREs which may be responsible for exon inclusion or exclusion across tissues.

Conclusion: The new algorithm can identify different combinations of splicing enhancers and silencers without assuming a predefined size or limiting the algorithm to find only pairs of SREs. Our approach opens new directions to study SREs and the roles that AS may play in diseases and tissue specificity.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Computer Graphics*
  • Exons / genetics
  • Gene Expression Regulation, Neoplastic*
  • Humans
  • Introns / genetics
  • Neoplasm Proteins / genetics*
  • Neoplasms / genetics*
  • RNA Splicing / genetics*
  • Regulatory Sequences, Nucleic Acid / genetics*

Substances

  • Neoplasm Proteins