Background: Most of the existing RNA structure prediction programs fold a completely synthesized RNA molecule. However, within the cell, RNA molecules emerge sequentially during the directed process of transcription. Dedicated experiments with individual RNA molecules have shown that RNA folds while it is being transcribed and that its correct folding can also depend on the proper speed of transcription.
Methods: The main aim of this work is to study if and how co-transcriptional folding is encoded within the primary and secondary structure of RNA genes. In order to achieve this, we study the known primary and secondary structures of a comprehensive data set of 361 RNA genes as well as a set of 48 RNA sequences that are known to differ from the originally transcribed sequence units. We detect co-transcriptional folding by defining two measures of directedness which quantify the extend of asymmetry between alternative helices that lie 5' and those that lie 3' of the known helices with which they compete.
Results: We show with statistical significance that co-transcriptional folding strongly influences RNA sequences in two ways: (1) alternative helices that would compete with the formation of the functional structure during co-transcriptional folding are suppressed and (2) the formation of transient structures which may serve as guidelines for the co-transcriptional folding pathway is encouraged.
Conclusions: These findings have a number of implications for RNA secondary structure prediction methods and the detection of RNA genes.