Motivation: The most common RNA-Seq strategy consists of random shearing, amplification and high-throughput sequencing of the RNA fraction. Methods to analyze transcription level variations along the genome from the read count profiles generated by the RNA-Seq protocol are needed.
Results: We developed a statistical approach to estimate the local transcription levels and to identify transcript borders. This transcriptional landscape reconstruction relies on a state-space model to describe transcription level variations in terms of abrupt shifts and more progressive drifts. A new emission model is introduced to capture not only the read count variance inside a transcript but also its short-range autocorrelation and the fraction of positions with zero counts. The estimation relies on a particle Gibbs algorithm whose running time makes it more suited to microbial genomes. The approach outperformed read-overlapping strategies on synthetic and real microbial datasets.
Availability: A program named Parseq is available at: http://www.lgm.upmc.fr/parseq/.
Contact: bodgan.mirauta@upmc.fr
Supplementary information: Supplementary data are available at Bioinformatics online.