Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Aug 27;373(6558):1047-1051.
doi: 10.1126/science.abe5650.

Geometric deep learning of RNA structure

Affiliations

Geometric deep learning of RNA structure

Raphael J L Townshend et al. Science. .

Erratum in

Abstract

RNA molecules adopt three-dimensional structures that are critical to their function and of interest in drug discovery. Few RNA structures are known, however, and predicting them computationally has proven challenging. We introduce a machine learning approach that enables identification of accurate structural models without assumptions about their defining characteristics, despite being trained with only 18 known RNA structures. The resulting scoring function, the Atomic Rotationally Equivariant Scorer (ARES), substantially outperforms previous methods and consistently produces the best results in community-wide blind RNA structure prediction challenges. By learning effectively even from a small amount of data, our approach overcomes a major limitation of standard deep neural networks. Because it uses only atomic coordinates as inputs and incorporates no RNA-specific information, this approach is applicable to diverse problems in structural biology, chemistry, materials science, and beyond.

PubMed Disclaimer

Conflict of interest statement

Competing interests: Stanford University has filed a provisional patent application related to this work. R.J.L.T. is the founder of Atomic AI, an artificial intelligence–driven rational design company. R.D. has received honoraria for seminars at Ribometrix and Pfizer.

Figures

Fig. 1.
Fig. 1.. The ARES network.
(A) ARES takes as input a structural model, specified by each atom’s element type and 3D coordinates. Atom features are repeatedly updated based on the features of nearby atoms. This process results in a set of features encoding each atom’s environment. Each feature is then averaged across all atoms, and the resulting averages are fed into additional neural network layers, which output the predicted RMSD of the structural model from the true structure of the RNA molecule. Figure S1 illustrates the ARES architecture in more detail. (B) To perform structure prediction, we use ARES to score candidate structural models (e.g., those generated by the FARFAR2 sampling software), selecting the models that ARES predicts to be most accurate (i.e., lowest RMSD). (C) ARES is trained using 18 RNA structures solved before 2007. (D) We benchmark ARES using more recently solved RNA structures, most of which are much larger than any of those used for training. Representative examples of structures used for training and benchmarking are shown in this figure, with the remainder in fig. S2.
Fig. 2.
Fig. 2.. ARES substantially outperforms previous scoring functions at identifying accurate structural models.
(A) Given a large set of candidate structural models for each RNA in benchmark 1—which includes some models restrained to be close to the experimentally determined (native) structure—we rank the models using ARES and three leading scoring functions. The model scored as best by ARES is usually more accurate (as assessed by RMSD from the native structure) than the model scored as best by the other scoring functions. Each cross corresponds to one RNA. “Rosetta” indicates the most recent (2020) version of the Rosetta scoring function. (B) When using ARES, the 10 best-scoring structural models for each RNA in benchmark 1 include an accurate model more frequently than when using the other scoring functions. (C) For each RNA in benchmark 1, we determine the rank of the best-scoring near-native structural model—that is, how far down the ranked list we need to go to include one near-native structural model (RMSD < 2 Å). This rank is usually lower (better) for ARES than for the other scoring functions. Across the RNAs, the mean rank of the best-scoring near-native model is 3.6 for ARES, compared with 73.0, 26.4, and 127.7 for Rosetta, RASP, and 3dRNAscore, respectively (geometric means). (D) For each of the 16 RNAs in benchmark 2—for which all structural models were generated without using any template structures or other experimental data that could provide information on local tertiary structure—we determine the RMSD of the model scored as best by each of seven scoring functions. For each scoring function, we plot the median across RNAs, with a 95% confidence interval determined by bootstrapping (12). ARES significantly outperforms each of the other scoring functions [P values 0.001 to 0.016 (12)]. Of the other scoring functions, none significantly outperforms any other [P values 0.24 to 0.66].
Fig. 3.
Fig. 3.. ARES achieves state-of-the-art results in blind RNA structure prediction.
(A) We submitted structural models, which ARES selected from sets of candidates generated by FARFAR2, to four recent rounds of the RNA-Puzzles blind structure prediction challenge: RNA A (the adenovirus VA-I RNA), RNA B (the Geobacillus kaustophilus T-box discriminator–tRNAGly), RNA C (the Bacillus subtilis T-box–tRNAGly), and RNA D (the Nocardia farcinic T-box–tRNAIle), whose structures are now in the Protein Data Bank with IDs 6OL3, 6PMO, 6POM, and 6UFM, respectively. For all four RNAs, ARES produced the most accurate structural model of any method. The dash indicates no submission. (B) For RNA A, the adenovirus VA-I RNA, ARES selected a structural model (blue) with a 4.8-Å RMSD to the experimentally determined structure (green), which was not available at prediction time. (C) For RNA A, the most accurate structural model produced by any another method (orange; produced by Rosetta) had an RMSD of 7.7 Å. ARES predicts the 3D geometry of the hinge motif at lower left much more accurately than Rosetta (fig. S7). Figure S8 illustrates results for the other RNAs.
Fig. 4.
Fig. 4.. ARES learns to identify key characteristics of RNA structure that are not specified in advance.
(A) As the distance between two complementary strands of an RNA double helix is varied, ARES assigns the best scores when the distance closely approximates the experimentally observed distance (red vertical line). The distance is measured between the C-4′ atoms of the central base pair (yellow dotted lines). (B) ARES’s learned features separate RNA structures according to the fraction of bases that form Watson-Crick pairs (left) and the average number of hydrogen bonds per base (right). The arrow in each plot indicates the direction of separation. Learned features 1, 2, and 3 are the first, second, and third principal components, respectively, of the activation values of the 256 nodes in ARES’s penultimate layer across 1576 RNA structures. Each dot corresponds to one of these structures (12).

Comment in

Similar articles

Cited by

References

    1. Cech TR, Steitz JA, Cell 157, 77–94 (2014). - PubMed
    1. Warner KD, Hajdin CE, Weeks KM, Nat. Rev. Drug Discov. 17, 547–558 (2018). - PMC - PubMed
    1. Churkin A et al., Brief. Bioinform. 19, 350–358 (2018). - PMC - PubMed
    1. ENCODE Project Consortium, Nature 489, 57–74 (2012). - PMC - PubMed
    1. Berman HM et al., Nucleic Acids Res. 28, 235–242 (2000). - PMC - PubMed

Publication types

LinkOut - more resources