Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Nov 16;45(20):11570-11581.
doi: 10.1093/nar/gkx815.

TurboFold II: RNA Structural Alignment and Secondary Structure Prediction Informed by Multiple Homologs

Affiliations
Free PMC article

TurboFold II: RNA Structural Alignment and Secondary Structure Prediction Informed by Multiple Homologs

Zhen Tan et al. Nucleic Acids Res. .
Free PMC article

Abstract

This paper presents TurboFold II, an extension of the TurboFold algorithm for predicting secondary structures for multiple RNA homologs. TurboFold II augments the structure prediction capabilities of TurboFold by additionally providing multiple sequence alignments. Probabilities for alignment of nucleotide positions between all pairs of input sequences are iteratively estimated in TurboFold II by incorporating information from both the sequence identity and secondary structures. A multiple sequence alignment is obtained from these probabilities by using a probabilistic consistency transformation and a hierarchically computed guide tree. To assess TurboFold II, its sequence alignment and structure predictions were compared with leading tools, including methods that focus on alignment alone and methods that provide both alignment and structure prediction. TurboFold II has comparable alignment accuracy with MAFFT and higher accuracy than other tools. TurboFold II also has comparable structure prediction accuracy as the original TurboFold algorithm, which is one of the most accurate methods. TurboFold II is part of the RNAstructure software package, which is freely available for download at http://rna.urmc.rochester.edu under a GPL license.

Figures

Figure 1.
Figure 1.
Flowchart for TurboFold II. The input is a set of homologous RNA sequences. In step 1, the pairwise posterior co-incidence probabilities (rectangular matrices) are calculated by pairwise HMM alignment. In step 2, base pairing probabilities (lower triangular matrices) are calculated using a partition function. In step 3, a match score is calculated for each sequence using the base pairing probabilities. In step 4, the coincidence probabilities are re-estimated using the match scores. In step 5, the base pairing probabilities and coincidence probabilities are used to calculate extrinsic information for each sequence, and, in step 6, the base pairing probabilities are re-estimated using the extrinsic information. Steps 3, 4, 5 and 6 form a loop that is used for multiple iterations. At step 7, a probabilistic consistency transformation is used to estimate a multiple sequence alignment. And at step 8, structures are estimated for each sequence.
Figure 2.
Figure 2.
Sensitivity and PPV of alignment (left) and structure (right) predictions. Sensitivity and PPV of alignment predictions obtained by running the methods with 5, 10 or 20 input sequences on the small subunit rRNA, RNase P RNA, SRP RNA and telomerase RNA test datasets. The star (*) above the bar for a method indicates that the difference in sensitivity (or PPV) between the method and TurboFold II is statistically significant, as determined by a paired t-test. Numerical sensitivity and PPV values corresponding to the plots in the figures are provided in the Supplementary Materials in Tables S1 and S2 for alignment and structure prediction, respectively.
Figure 3.
Figure 3.
Predicted structures and alignments for Nocardioides albus, Propioniferax innocua and Salt Marsh A26. Structures for Nocardioides albus (A), Propioniferax innocua (B) and Salt Marsh A26 (C) as predicted by TurboFold II. (D) Database alignments for Nocardioides albus, Propioniferax innocua and Salt Marsh A26. Alignments as predicted by TurboFold II (E), ProbCons (F), ClustalW (G), Clustal Omega (H), LocARNA (I), MXSCARNA (J), MAFFT (K) and R-Coffee (L). The alignment accuracy is indicated as sensitivity and PPV for each method. The colored nucleotides correspond to helices in database structures.
Figure 4.
Figure 4.
An example from the alignment of tRNA sequence homologs that illustrates how the update of posterior coincidence probabilities introduced in TurboFold II can improve alignments by incorporating structural information. tRNA structures of (A) Halorubrum lacusprofundi (tdbD00000003) and (B) Streptococcus pneumoniae TIGR4 (tdbD00009726) (57,85) by TurboFold II with three other tRNAs. (C) Predicted alignment of the two sequences. The nucleotides in predicted helices are indicated by corresponding colors in both the alignment and the structures. (D) The posterior co-incidence probabilities calculated by pairwise HMM alignment. The co-incidence probabilities are color coded as shown by the adjacent key. (E) The posterior co-incidence probabilities of pairwise HMM alignment incorporating the match score. (FH) Posterior co-incidence probabilities by incorporating match score after first (F), second (G) and third (H) iterations, respectively. (I) The alignment from the Sprinzl database (48,68). The colored blocks along the axes in the alignment probability plots (D–I) identify the nucleotides for helices shown in (A), (B) and (C).

Similar articles

See all similar articles

Cited by 10 articles

See all "Cited by" articles

References

    1. Stark B.C., Kole R., Bowman E.J., Altman S. Ribonuclease P: an enzyme with an essential RNA component. Proc. Natl. Acad. Sci. U.S.A. 1978; 75:3717–3721. - PMC - PubMed
    1. Cech T.R., Zaug A.J., Grabowski P.J. In vitro splicing of the ribosomal RNA precursor of Tetrahymena: involvement of a guanosine nucleotide in the excision of the intervening sequence. Cell. 1981; 27:487–496. - PubMed
    1. Doudna J.A., Cech T.R. The chemical repertoire of natural ribozymes. Nature. 2002; 418:222–228. - PubMed
    1. Griffiths-Jones S. Annotating noncoding RNA genes. Annu. Rev. Genomics Hum. Genet. 2007; 8:279–298. - PubMed
    1. Eddy S.R. Non-coding RNA genes and the modern RNA world. Nat. Rev. Genet. 2001; 2:919–929. - PubMed

LinkOut - more resources

Feedback