Deep-sequencing of bacterial transcriptomes using RNA-Seq technology has made it possible to identify small non-coding RNAs, RNA molecules which regulate gene expression in response to changing environments, on a genome-wide scale in an ever-increasing range of prokaryotes. However, a simple and reliable automated method for identifying sRNA candidates in these large datasets is lacking. Here, after generating a transcriptome from an exponential phase culture of Mycobacterium tuberculosis H37Rv, we developed and validated an automated method for the genome-wide identification of sRNA candidate-containing regions within RNA-Seq datasets based on the analysis of the characteristics of reads coverage maps. We identified 192 novel candidate sRNA-encoding regions in intergenic regions and 664 RNA transcripts transcribed from regions antisense (as) to open reading frames (ORF), which bear the characteristics of asRNAs, and validated 28 of these novel sRNA-encoding regions by northern blotting. Our work has not only provided a simple automated method for genome-wide identification of candidate sRNA-encoding regions in RNA-Seq data, but has also uncovered many novel candidate sRNA-encoding regions in M. tuberculosis, reinforcing the view that the control of gene expression in bacteria is more complex than previously anticipated.
Keywords: Mycobacterium tuberculosis; RNA-Seq; non-coding RNA; transcriptome.
© The Author 2016. Published by Oxford University Press on behalf of the Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences. All rights reserved. For permissions, please e-mail: firstname.lastname@example.org.