Splicing of long non-coding RNAs primarily depends on polypyrimidine tract and 5' splice-site sequences due to weak interactions with SR proteins

Nucleic Acids Res. 2019 Jan 25;47(2):911-928. doi: 10.1093/nar/gky1147.


Many nascent long non-coding RNAs (lncRNAs) undergo the same maturation steps as pre-mRNAs of protein-coding genes (PCGs), but they are often poorly spliced. To identify the underlying mechanisms for this phenomenon, we searched for putative splicing inhibitory sequences using the ncRNA-a2 as a model. Genome-wide analyses of intergenic lncRNAs (lincRNAs) revealed that lincRNA splicing efficiency positively correlates with 5'ss strength while no such correlation was identified for PCGs. In addition, efficiently spliced lincRNAs have higher thymidine content in the polypyrimidine tract (PPT) compared to efficiently spliced PCGs. Using model lincRNAs, we provide experimental evidence that strengthening the 5'ss and increasing the T content in PPT significantly enhances lincRNA splicing. We further showed that lincRNA exons contain less putative binding sites for SR proteins. To map binding of SR proteins to lincRNAs, we performed iCLIP with SRSF2, SRSF5 and SRSF6 and analyzed eCLIP data for SRSF1, SRSF7 and SRSF9. All examined SR proteins bind lincRNA exons to a much lower extent than expression-matched PCGs. We propose that lincRNAs lack the cooperative interaction network that enhances splicing, which renders their splicing outcome more dependent on the optimality of splice sites.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • HeLa Cells
  • Humans
  • Introns*
  • Pyrimidines / analysis
  • RNA Splice Sites*
  • RNA Splicing*
  • RNA, Long Noncoding / metabolism*
  • Serine-Arginine Splicing Factors / metabolism*


  • Pyrimidines
  • RNA Splice Sites
  • RNA, Long Noncoding
  • Serine-Arginine Splicing Factors
  • pyrimidine