Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2003 Sep;13(9):2042-51.
doi: 10.1101/gr.1257503.

Widespread Selection for Local RNA Secondary Structure in Coding Regions of Bacterial Genes

Affiliations
Free PMC article
Comparative Study

Widespread Selection for Local RNA Secondary Structure in Coding Regions of Bacterial Genes

Luba Katz et al. Genome Res. .
Free PMC article

Abstract

Redundancy of the genetic code dictates that a given protein can be encoded by a large collection of distinct mRNA species, potentially allowing mRNAs to simultaneously optimize desirable RNA structural features in addition to their protein-coding function. To determine whether natural mRNAs exhibit biases related to local RNA secondary structure, a new randomization procedure was developed, DicodonShuffle, which randomizes mRNA sequences while preserving the same encoded protein sequence, the same codon usage, and the same dinucleotide composition as the native message. Genes from 10 of 14 eubacterial species studied and one eukaryote, the yeast Saccharomyces cerevisiae, exhibited statistically significant biases in favor of local RNA structure as measured by folding free energy. Several significant associations suggest functional roles for mRNA structure, including stronger secondary structure bias in the coding regions of intron-containing yeast genes than in intronless genes, and significantly higher folding potential in polycistronic messages than in monocistronic messages in Escherichia coli. Potential secondary structure generally increased in genes from the 5' to the 3' end of E. coli operons, and secondary structure potential was conserved in homologous Salmonella typhi operons. These results are interpreted in terms of possible roles of RNA structures in RNA processing, regulation of mRNA stability, and translational control.

Figures

Figure 1
Figure 1
Distribution of Z scores for E. coli genes. Each gene in the E. coli genome was shuffled 20 times, and a Z score was calculated for each as described in the text. A histogram of these Z scores (solid) and a standard normal distribution (dashed) are shown.
Figure 2
Figure 2
Distribution of EFP along coding regions in (A) E. coli and (B) S. cerevisiae. For native and shuffled mRNAs, six subsets of 50-bp sequence windows from each end of the coding region (corresponding to the overlapping windows 1–10, 11–20, 21–30, 31–40, 41–50, 51–60 relative to the 5′ or 3′ end) were folded, average folding free energies were calculated for native (formula image) and CodonShuffled (formula image) sequences, and the formula image was determined for each segment. Because the step size between successive windows is 10 bp, each bin corresponds to 140 bases of sequence.
Figure 3
Figure 3
Bias for secondary structure in bacterial operons. Operons were classified on the basis of their locations in annotated polycistronic messages, and the average Z score was calculated for each subset. Numbers 1–6 correspond to the location of genes within operons relative to the 5′ end of the transcript. Genes in position 7or higher within operons are not shown. The number of genes in each data set is indicated.
Figure 4
Figure 4
Genes in the yeast mitochondrial genome have unusually high folding potential. (A) Z scores were calculated for every gene in the mitochondrial genome of S. cerevisiae. For alternatively spliced genes, only the longest isoform was chosen for analysis. One gene was excluded because of its location inside the intron of another gene. (B) Positions of the group I (blue) and group II (green) introns relative to the start site are shown superimposed on the minimum free energy plot for the mitochondrial gene COX1 (magenta). Coordinates and identity of introns were obtained from GenBank (accession no. AJ011856).

Similar articles

See all similar articles

Cited by 100 articles

See all "Cited by" articles

Publication types

MeSH terms

LinkOut - more resources

Feedback