Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 327 (5968), 1014-8

NMR Structure Determination for Larger Proteins Using Backbone-Only Data

Affiliations

NMR Structure Determination for Larger Proteins Using Backbone-Only Data

Srivatsan Raman et al. Science.

Abstract

Conventional protein structure determination from nuclear magnetic resonance data relies heavily on side-chain proton-to-proton distances. The necessary side-chain resonance assignment, however, is labor intensive and prone to error. Here we show that structures can be accurately determined without nuclear magnetic resonance (NMR) information on the side chains for proteins up to 25 kilodaltons by incorporating backbone chemical shifts, residual dipolar couplings, and amide proton distances into the Rosetta protein structure modeling methodology. These data, which are too sparse for conventional methods, serve only to guide conformational search toward the lowest-energy conformations in the folding landscape; the details of the computed models are determined by the physical chemistry implicit in the Rosetta all-atom energy function. The new method is not hindered by the deuteration required to suppress nuclear relaxation processes for proteins greater than 15 kilodaltons and should enable routine NMR structure determination for larger proteins.

Figures

Figure1
Figure1
Impact of RDC data on conformational search. Lines depict RMSD histograms for the lowest low-resolution energy 10% of structures generated using CS-Rosetta(black) or CS-RDC-Rosetta(red). (a) BcR103A (b) DvR115G (b) RrR43 (d) SrR115C
Figure 2
Figure 2
Determination of ALG13 structure from backbone NMR data with Rosetta. (a) RMSDs and energies of structures generated in batches of 2000 during the iterative protocol. Each generation of structures (color code: blue to red, corresponds to number of generation) is based on information from previous runs (cf. Methods). Strong convergence is reached already in the computational less expensive low-resolution mode. The last generations (orange to red) increase both the precision and accuracy of the ensemble, by refining the structures within the Rosetta all-atom energy. The RMSD is computed over the residues for which convergence within 3Å root mean square fluctuations (RMSF) was reached in the 50 lowest energy Rosetta models (5–70, 81–139, 151–180). (b) Ensemble of 10 lowest energy Rosetta structures (below line in panel a). Regions with more than 3 Å RMSF are colored in grey. (c) Comparison of the RMSF at each residue in the low energy Rosetta ensemble to NMR R1 relaxation rate (Red, relaxation rates; black, RMSF in Rosetta ensemble). Regions variable in the low energy structures exhibit increased dynamics in solution; these data were not used in the structure calculation. (d) NMR solution ensemble based on side-chain NOEs, RDC and PRE data as deposited in the Protein Data Bank (PDB code: 2jzc).
Figure 3
Figure 3
Blind predictions with the CS-RDC-Rosetta and iterative CS-RDC-Rosetta protocols. Left panels: superposition of the lowest energy 10 predicted structures (red) over the experimentally solved ensemble of NMR structures (blue); right panels: magnified view of the core side-chains. Rosetta models in panels (a–d) were determined with CS-RDC-Rosetta and in (e) with iterative CS-RDC-Rosetta. (a) BcR268F (b) DvR115G (c) MaR214A (d) SrR115C (e) AtT7
Figure 4
Figure 4
Effect of incorporation of experimental data on energy minimization. (a) The Rosetta all atom energy (black line) has many local minima making minimization difficult, but the global minimum is generally close to the native structure (N). The experimental bias (red line), while smoother, has degeneracies and lacks resolution because the data are sparse. Local minima of the all-atom energy and the experimental bias are uncorrelated far away from the native structure but coincide close to the native structure. Accordingly, far from the global minimum, including the experimental data during optimization usually results in higher energies (arrow 1), while close to the native structure (N), including the data results in lower energies(arrow 2). (b) Lines represent the lowest energies sampled by structures at various RMSDs after optimization in the absence (black line) or presence (red line) of experimental data. Generally, the all-atom energy and experimental data are in concordance for conformations close to the native protein structure but not for conformations far from the native structure. If this concordance condition is met, close to the native structure the experimental data can guide sampling towards the global minimum (arrow 2) and thus constrained optimization can result in lower energy conformations than unconstrained optimization, while biased optimization is less effective than unconstrained optimization distant from the native structure leading to higher energies(arrow 1). In contrast, (c–d) All-atom energy and RMSD of final Rosetta ensemble from iterative refinement with and without experimental data. Lines represent the median of the 10 lowest energy models per RMSD-bin. (c) 1f21 – an unsuccessful calculation; biased optimization with RDC data(red) yields similar energies as unbiased optimization (black); there is a large remaining energy gap to the native structure (blue dots). (d) Alg13 – a successful calculation; biased optimization with the experimental data (red) results in lower energies than unbiased optimization (black).

Similar articles

See all similar articles

Cited by 132 PubMed Central articles

See all "Cited by" articles

Publication types

Feedback