SAW: a method to identify splicing events from RNA-Seq data based on splicing fingerprints

PLoS One. 2010 Aug 10;5(8):e12047. doi: 10.1371/journal.pone.0012047.

Abstract

Splicing event identification is one of the most important issues in the comprehensive analysis of transcription profile. Recent development of next-generation sequencing technology has generated an extensive profile of alternative splicing. However, while many of these splicing events are between exons that are relatively close on genome sequences, reads generated by RNA-Seq are not limited to alternative splicing between close exons but occur in virtually all splicing events. In this work, a novel method, SAW, was proposed for the identification of all splicing events based on short reads from RNA-Seq. It was observed that short reads not in known gene models are actually absent words from known gene sequences. An efficient method to filter and cluster these short reads by fingerprint fragments of splicing events without aligning short reads to genome sequences was developed. Additionally, the possible splicing sites were also determined without alignment against genome sequences. A consensus sequence was then generated for each short read cluster, which was then aligned to the genome sequences. Results demonstrated that this method could identify more than 90% of the known splicing events with a very low false discovery rate, as well as accurately identify, a number of novel splicing events between distant exons.

MeSH terms

  • Animals
  • Base Sequence
  • Cluster Analysis
  • Computational Biology / methods*
  • Databases, Genetic
  • Exons / genetics
  • Genome / genetics
  • Mice
  • RNA / genetics*
  • RNA Splicing / genetics*
  • Sequence Analysis, DNA

Substances

  • RNA