Motivation: RNA expression signals detected by high-density genomic tiling microarrays contain comprehensive transcriptomic information of the target organism. Current methods for determining the RNA transcription units are still computation intense and lack the discriminative power. This article describes an efficient and accurate methodology to reveal complicated transcriptional architecture, including small regulatory RNAs, in microbial transcriptome profiles.
Results: Normalized microarray data were first subject to support vector regression to estimate the profile tendency by reducing noise interruption. A hybrid supervised machine learning algorithm, hidden Markov support vector machines, was then used to classify the underlying state of each probe to 'expression' or 'silence' with the assumption that the consecutive state sequence was a heterogeneous Markov chain. For model construction, we introduced a profile geometry learning method to construct the feature vectors, which considered both intensity profiles and changes of intensities over the probe spacing. Also, a robust strategy was used to dynamically evaluate and select the training set based only on prior computer gene annotation. The algorithm performed better than other methods in accuracy on simulated data, especially for small expressed regions with lower (<1) SNR (signal-to-noise ratio), hence more sensitive for detecting small RNAs.
Availability and implementation: Detail implementation steps of the algorithm and the complete result of the transcriptome analysis for a microbial genome Porphyromonas gingivalis W83 can be viewed at http://bioinformatics.forsyth.org/mtd.