Modeling alternative splicing variants from RNA-Seq data with isoform graphs

J Comput Biol. 2014 Jan;21(1):16-40. doi: 10.1089/cmb.2013.0112. Epub 2013 Nov 7.

Abstract

Next-generation sequencing (NGS) technologies need new methodologies for alternative splicing (AS) analysis. Current computational methods for AS analysis from NGS data are mainly based on aligning short reads against a reference genome, while methods that do not need a reference genome are mostly underdeveloped. In this context, the main developed tools for NGS data focus on de novo transcriptome assembly (Grabherr et al., 2011 ; Schulz et al., 2012). While these tools are extensively applied for biological investigations and often show intrinsic shortcomings from the obtained results, a theoretical investigation of the inherent computational limits of transcriptome analysis from NGS data, when a reference genome is unknown or highly unreliable, is still missing. On the other hand, we still lack methods for computing the gene structures due to AS events under the above assumptions--a problem that we start to tackle with this article. More precisely, based on the notion of isoform graph (Lacroix et al., 2008), we define a compact representation of gene structures--called splicing graph--and investigate the computational problem of building a splicing graph that is (i) compatible with NGS data and (ii) isomorphic to the isoform graph. We characterize when there is only one representative splicing graph compatible with input data, and we propose an efficient algorithmic approach to compute this graph.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Alternative Splicing*
  • Computational Biology
  • Computer Graphics
  • Databases, Nucleic Acid / statistics & numerical data
  • Gene Expression Profiling / statistics & numerical data
  • High-Throughput Nucleotide Sequencing / statistics & numerical data
  • Models, Genetic*
  • Polymorphism, Single Nucleotide
  • Repetitive Sequences, Nucleic Acid
  • Sequence Alignment / statistics & numerical data
  • Sequence Analysis, RNA / statistics & numerical data