Parseq: reconstruction of microbial transcription landscape from RNA-Seq read counts using state-space models

Bioinformatics. 2014 May 15;30(10):1409-16. doi: 10.1093/bioinformatics/btu042. Epub 2014 Jan 27.

Abstract

Motivation: The most common RNA-Seq strategy consists of random shearing, amplification and high-throughput sequencing of the RNA fraction. Methods to analyze transcription level variations along the genome from the read count profiles generated by the RNA-Seq protocol are needed.

Results: We developed a statistical approach to estimate the local transcription levels and to identify transcript borders. This transcriptional landscape reconstruction relies on a state-space model to describe transcription level variations in terms of abrupt shifts and more progressive drifts. A new emission model is introduced to capture not only the read count variance inside a transcript but also its short-range autocorrelation and the fraction of positions with zero counts. The estimation relies on a particle Gibbs algorithm whose running time makes it more suited to microbial genomes. The approach outperformed read-overlapping strategies on synthetic and real microbial datasets.

Availability: A program named Parseq is available at: http://www.lgm.upmc.fr/parseq/.

Contact: bodgan.mirauta@upmc.fr

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Escherichia coli / genetics
  • Gene Expression Profiling / methods
  • High-Throughput Nucleotide Sequencing / methods*
  • Markov Chains
  • Models, Genetic
  • Monte Carlo Method
  • RNA / genetics
  • Saccharomyces cerevisiae / genetics
  • Sequence Analysis, RNA / methods*
  • Transcription, Genetic

Substances

  • RNA