Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jun 20;46(11):5381-5394.
doi: 10.1093/nar/gky285.

bpRNA: Large-Scale Automated Annotation and Analysis of RNA Secondary Structure

Affiliations
Free PMC article

bpRNA: Large-Scale Automated Annotation and Analysis of RNA Secondary Structure

Padideh Danaee et al. Nucleic Acids Res. .
Free PMC article

Abstract

While RNA secondary structure prediction from sequence data has made remarkable progress, there is a need for improved strategies for annotating the features of RNA secondary structures. Here, we present bpRNA, a novel annotation tool capable of parsing RNA structures, including complex pseudoknot-containing RNAs, to yield an objective, precise, compact, unambiguous, easily-interpretable description of all loops, stems, and pseudoknots, along with the positions, sequence, and flanking base pairs of each such structural feature. We also introduce several new informative representations of RNA structure types to improve structure visualization and interpretation. We have further used bpRNA to generate a web-accessible meta-database, 'bpRNA-1m', of over 100 000 single-molecule, known secondary structures; this is both more fully and accurately annotated and over 20-times larger than existing databases. We use a subset of the database with highly similar (≥90% identical) sequences filtered out to report on statistical trends in sequence, flanking base pairs, and length. Both the bpRNA method and the bpRNA-1m database will be valuable resources both for specific analysis of individual RNA molecules and large-scale analyses such as are useful for updating RNA energy parameters for computational thermodynamic predictions, improving machine learning models for structure prediction, and for benchmarking structure-prediction algorithms.

Figures

Figure 1.
Figure 1.
RNA structure types. (A) cartoon schematic of RNA structure types. (B) Hairpins have one closing pair and one mismatch pair with nucleotides defined by ordering from 5′ to 3′. ( C) Internal loops have two closing base pairs and two mismatch pairs each defined by ordering from 5′ to 3′ relative to the 5′ internal loop strand. The nucleotides of the closing pairs are defined as 5′ or 3′ based on their positions relative to the loop sequence. (D) Bulges have one loop strand, but have two closing base pairs and two mismatch pairs defined 5′ to 3′. (E) Multiloops have a closing pair for each branch. The nucleotides of the closing pairs are defined as 5′ or 3′ based on their positions relative to the loop sequence. Red dashed line represents the common axis of coaxially stacked stems. (F) A depiction of RNA page number, which can be viewed as separate half-planes containing edges corresponding to base pairs. Each symbol type corresponds to a separate page, and edges within a page are nested.
Figure 2.
Figure 2.
Segment graph example. (A) Secondary structure of the Anopholes gambia drz-agam-2-2 ribozyme. (B) The segments are the vertices of the segment graph and ordered from 5′ to 3′, and directed edges are defined by unpaired strands connecting segments. (C) Segments with base pairs crossing other segments comprise the PK-graph. A maximally weighted independent set is selected by dynamic programming, with the remaining segments defined as pseudoknots. (D) The pseudoknot-free segment graph is created after remove PK base pairs and allows easy annotation of loops. (E) The structure array enhances bpRNA’s multi-bracket dot-bracket sequence by labeling each positions structure type. Strands participating in pseudoknots are labeled in the structure array by their loop-type in the structure resulting from the removal of PKs.
Figure 3.
Figure 3.
Hairpins in bpRNA-1m(90). (A) The distribution of hairpin loop lengths in bpRNA-1m(90) has two primary peaks, overlapping the same peak for subsets defined by closing pairs. (B) Heat map shows the frequency of nucleotides occurring in closing base pairs. (C) Heat map shows the frequency of pairs of nucleotides occurring in hairpin mismatch pairs.
Figure 4.
Figure 4.
Tetraloops and heptaloops. (A) Scatterplot compares the frequency of tetraloop sequences to destabilizing energy. (B) Sequence LOGOs demonstrate sequence biases in the most enriched tetraloops. (C) Scatterplot compares the frequency of heptaloop sequences to destabilizing energy. (D) Sequence LOGOs demonstrate the sequence biases in the most enriched heptaloops.
Figure 5.
Figure 5.
Internal loops. (A) Heat map shows the frequency of internal loops based on 5′ and 3′ loop length. (B) Heat map shows the frequency of base pairs occurring in 5′ and 3′ internal loop closing base pairs. (C) Heat map shows the frequency of pairs of nucleotides occurring in 5′ and 3′ internal loop mismatch pairs. (D) Stacked histograms of 5′ internal loop lengths when organized by the 5′ closing base pair. (E) Stacked histograms of the 3′ internal loop lengths when organized by the 3′ closing base pair.
Figure 6.
Figure 6.
Bulges. (A) Bulge length histogram. (B) Nucleotide frequency in bulges of length 1. (C) Heat map of closing base pairs. (D) Heat map of mismatches. (E) Bulge length distribution for different 5′ closing base pairs. (F) Bulge length distribution for different 3′ closing base pairs.
Figure 7.
Figure 7.
Multiloops. (A) Histogram of branch number for all multiloops in bpRNA-1m(90). (B) Branch length for multiloops with different branch numbers. (C) Closing pair heat map. (D) Mismatch heat map. (E) Length distribution for different GC closing base pairs. (F) Length distribution for different AU closing base pairs.
Figure 8.
Figure 8.
Stems and pseudoknots. (A) The frequency of stem types compared to their rank has a Zipfian distribution with a scale factor approximately equal to –1.00. (B) bpRNA classifies pseudoknots by the loops that their base pairs connect when the pseudoknots are removed.

Similar articles

See all similar articles

Cited by 2 articles

References

    1. Cate J.H., Gooding A.R., Podell E., Zhou K., Golden B.L., Kundrot C.E., Cech T.R., Doudna J.A. Crystal structure of a group I ribozyme domain: principles of RNA packing. Science. 1996; 273:1678–1685. - PubMed
    1. Correll C.C., Freeborn B., Moore P.B., Steitz T.A. Metals, motifs, and recognition in the crystal structure of a 5S rRNA domain. Cell. 1997; 91:705–712. - PubMed
    1. Harris M., Kazantsev A., Chen J., Pace N. Analysis of the tertiary structure of the ribonuclease P ribozyme-substrate complex by site-specific photoaffinity crosslinking. RNA. 1997; 3:561–576. - PMC - PubMed
    1. Michel F., Westhof E. Modelling of the three-dimensional architecture of group I catalytic introns based on comparative sequence analysis. J. Mol. Biol. 1990; 216:585–610. - PubMed
    1. Smit S., Yarus M., Knight R. Natural selection is not required to explain universal compositional patterns in rRNA secondary structure categories. RNA. 2006; 12:1–14. - PMC - PubMed

Publication types

Feedback