Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Sep 26;5:e17219.
doi: 10.7554/eLife.17219.

Automated Structure Refinement of Macromolecular Assemblies From cryo-EM Maps Using Rosetta

Free PMC article

Automated Structure Refinement of Macromolecular Assemblies From cryo-EM Maps Using Rosetta

Ray Yu-Ruei Wang et al. Elife. .
Free PMC article


Cryo-EM has revealed the structures of many challenging yet exciting macromolecular assemblies at near-atomic resolution (3-4.5Å), providing biological phenomena with molecular descriptions. However, at these resolutions, accurately positioning individual atoms remains challenging and error-prone. Manually refining thousands of amino acids - typical in a macromolecular assembly - is tedious and time-consuming. We present an automated method that can improve the atomic details in models that are manually built in near-atomic-resolution cryo-EM maps. Applying the method to three systems recently solved by cryo-EM, we are able to improve model geometry while maintaining the fit-to-density. Backbone placement errors are automatically detected and corrected, and the refinement shows a large radius of convergence. The results demonstrate that the method is amenable to structures with symmetry, of very large size, and containing RNA as well as covalently bound ligands. The method should streamline the cryo-EM structure determination process, providing accurate and unbiased atomic structure interpretation of such maps.

Keywords: Rosetta; atomic models; biophysics; computational biology; cryo-EM; macromolecular assemblies; membrane proteins; none; structural biology; structure refinement; systems biology.

Conflict of interest statement

YS: Co-founder of Cyrus Biotechnology, Inc., which will develop and market graphic-interface software for using Rosetta. The other authors declare that no competing interests exist.


Figure 1.
Figure 1.. An overview of the three stages of automated refinement.
(Left) In stage 1, problematic regions are predicted using a newly developed error predictor that looks for local strain in the model and poor local density-fit. These selected regions are subject to iterative fragment-based rebuilding within a Monte Carlo sampling trajectory. Refinement in this stage is restricted to using one-half of the data, referred to as the training map. (Middle) In stage 2, the best models from the ~5000 independent Monte Carlo trajectories are selected. Models are selected based: on agreement to the validation map (independently constructed from the other half of the data), then by model geometry as assessed by MolProbity, and finally, on agreement to the full reconstruction. At this point, the selected models should in general have good fit-to-density and good geometry without overfitting to the data. (Right) In stage 3, using the 10 best models selected, we then optimize against the full reconstruction. Two half maps are used to choose the optimal density weight to refine structures using full-reconstruction. Finally, these top 10 models are optimized (without large-scale backbone rebuilding) into the full-reconstruction, which alternates with voxel-size refinement iteratively. Finally, these models are subject to B-factor refinement. DOI:
Figure 1—figure supplement 1.
Figure 1—figure supplement 1.. A close-up view of model strain indicating errors in density-optimized TRPV1 models using the superceded Rosetta approach.
Both insets show two regions of models refined by the superceded Rosetta approach, where strain can indicate errors in models. In both cases, phenylalanine sidechains fit the density well, but both show geometric strain around the Cβ atom. The type of strain (as evaluated by MolProbity) is indicated by model color, using the key on the right. DOI:
Figure 1—figure supplement 2.
Figure 1—figure supplement 2.. Incorporating model strain improves error detection.
Guided by the 3.3-Å 20S proteasome reconstruction, we evaluated 500 models against the high-resolution crystal structure. We plot here the precision (y-axis) and recall (x-axis) of predicting which residues were incorrectly placed (RMS > 1Å). Use of density alone (pink line) is outperformed by using a combination of density and model strain (blue line). Our refinement approach considers four points on this curve when picking density + model strain cutoffs, indicated on the plot with 'Stage1–4'. DOI:
Figure 1—figure supplement 3.
Figure 1—figure supplement 3.. Density weight optimization against half maps for Mitoribosome.
Before refinement against the full reconstruction, we optimize the weight on the 'fit-to-density' energy using half maps, to avoid overfitting. We plot several key metrics here as a function of weight on the fit-to-density score term (x-axis), including the Fourier Shell Correlation (FSC) 'overfitting' (FSC work-free, top histogram), the Rosetta energy (second histogram), and several Molprobity model geometry terms (histograms 3–6). In all cases, we see a sharp inflection point at which overfitting increases and geometry gets notably worse. As a general rule-of-thumb, we use the weight maximizing FSCfree–(0.04*per-residue-energy to capture this inflection point). DOI:
Figure 1—figure supplement 4.
Figure 1—figure supplement 4.. Model geometry is improved with a separate pre-proline potential.
Refined models initially had poor pre-proline geometry. Thus, a new backbone torsional potential was created which separately treats pre-proline and pre-non-proline residues. In the plot, we show the old potential (left), the new pre-non-proline potential (middle), and the pre-proline potential (right) for three different residue identities. The color indicates the unweighted energy values, using the key on the right. DOI:
Figure 2.
Figure 2.. The accuracy of voxel size refinement and the effect of B-factor sharpening in Rosetta refinement.
(A) Voxel-size refinement on perturbed models. Perturbed structures were generated by running short MD trajectories in Rosetta, followed by all-atom minimization. Voxel size is refined against the perturbed models, yielding the density distribution in red. Following cycles of iterated voxel refinement and all-atom refinement, the voxel size shows significantly better convergence (blue line). (B) Rosetta structure refinement with a range values of B-factor sharpening. Integrated Fourier Shell Correlation eavluated using the validation map (free-iFSC) is plotted here as a function of B-factor sharpening of the training map. The results indicate that our refinement method is not particularly sensitive to the extent of B-factor sharpening, behaving similarly over a range of sharpening values between −40 and −130. The error bars show standard deviation of the free-iFSC among the top10 ensemble models (see Materials and methods for the ensemble selection method). DOI:
Figure 3.
Figure 3.. Refinement of the apo TRPV1 channel (EMD-5778) shows improved model quality.
(A) Comparison of the deposited and Rosetta-refined models, as assessed by MolProbity. Residues reported as violations are colored using the key shown on the far right. Blue open arrows indicate that the hydrogen-bond geometry of a β-hairpin was automatically detected and improved in the Rosetta refined model. (B) An overlay of the asymmetric unit of the deposited (pink) and the Rosetta-refined (green) model indicates the magnitude of conformational changes that are explored by our refinement approach. (C) The agreement of models to map assessed by Fourier space correlation (y-axis) at each resolution shell (x-axis), where the reported resolution (3.4Å) is depicted in a dashed orange line. DOI:
Figure 4.
Figure 4.. Refinement of the TRPV1 channel identifies a previously unmodeled disulfide bond.
(A) An overview of the entire structure, estimating local model uncertainty in two ways: local structural diversity and refined B-factors. Local structure diversity is indicated by showing (left) an overlay of the top 10 Rosetta models, (middle) the top model colored by per residue deviation, and (right) the refined per-atom B-factors. Using the model selection method illustrated in the middle panel of Figure 1, the Cα RMSDs among the selected ensemble range from 0.44 to 0.63 Å. The orange square shows the location of a newly identified disulfide bond (C386–C390) revealed by our refinement protocol. (B) A zoomed-in view of the disulfide linkage (C386–C390) identified by the automated method. Note that the sidechain coordinates of C390 were unassigned in the deposited model; for presentation, the sidechain atoms of C390 were optimally added by Rosetta on the basis of the deposited backbone torsion angles of C390. DOI:
Figure 5.
Figure 5.. Refinement of the F420-reducing [NiFe] hydrogenase (EMD-2513) improves the model geometry.
(A) An illustration comparing the model geometry of the deposited (upper panel) and Rosetta-refined (lower panel) models. Three chains (A/B/C) of the asymmetric unit of the complex are shown as cartoon with geometry violations reported by MolProbity colored according to the key shown on the far right. Four iron–sulfur clusters [4Fe4S] and a FAD are shown in a stick representation. Metal ions are depicted as spheres, with Zn grey, Fe orange, and Ni green. (B) Model–map agreement – as assessed by Fourier shell correlation (y-axis) as a function of resolution (x-axis) – quantifies this improvement following voxel-size refinement. (C) Model quality as assessed by EMRinger and MolProbity. The x-axis shows methods used to evaluate the models, while the y-axis shows the scores under each criterion. DOI:
Figure 5—figure supplement 1.
Figure 5—figure supplement 1.. The symmetry operators denoted in the deposited PDB (PDB 4ci0) produce a complex that could not fit into the deposited density map properly.
(Left panel) The symmetric complex downloaded from a protein data bank as a biounit shifts the entire complex out of the deposited density map. The middle and right panels show a zoomed-in view of two regions in the deposited models corresponding to the helix and the sheet indicated by the orange and cyan squares, respectively, in the left panel. DOI:
Figure 6.
Figure 6.. Refinement of the large subunit of the human mitochondrial ribosome (EMD-2762) shows improvements to all subunits.
(A) Scatterplots of model quality for each of the 48 protein chains compare the deposited (x-axis) and Rosetta (y-axis) models using MolProbity. On the left, the MolProbity scores of all 48 protein chains are compared, where a lower values indicates a better model geometry. On the right, the percentage of 'Ramachandran favored' residues on each chain are compared, with higher values preferable. (B) An evaluation of the fit-to-density of each protein chain. On the left, we compare the Fourier shell correlation (FSC) of each chain before and after refinement; we integrate the FSC from 10Å to 3.4Å. Higher values indicate better agreement with the data. The largest improvement, chain k, is indicated by the red arrow. On the right, we show the full FSC curve, with the deposited model shown in pink, and the Rosetta refined model shown in green; the reported map resolution (3.4Å) is indicated in the dashed orange line. (C) A zoomed-in view indicating a much improved backbone geometry and the large radius of convergence of the refinement of chain k. The left panel shows that the density for chain k is in the region of relatively low local resolution. DOI:
Figure 6—figure supplement 1.
Figure 6—figure supplement 1.. Local relax shows better placement of sidechains for large systems.
In the case of the mitoribosome, refinement of a particularly well-resolved region in the map (left) led to sidechains that are clearly misaligned with the density (middle). This was due to the poor convergence of our Monte Carlo sidechain placing approach when applied to systems with more than 1000 residues. Our alternative approach, LocalRelax, which performs many local sidechain optimizations, correctly places sidechains in a way that is consistent with density (right). DOI:
Figure 6—figure supplement 2.
Figure 6—figure supplement 2.. EMRinger analysis on refinement of the large subunit of the human mitochondrial ribosome.
A scatterplot of model quality assessed by EMringer of each of the 48 protein chains compares the deposited (x-axis) and Rosetta (y-axis) models. DOI:

Similar articles

See all similar articles

Cited by 70 articles

See all "Cited by" articles


    1. Afonine PV, Grosse-Kunstleve RW, Echols N, Headd JJ, Moriarty NW, Mustyakimov M, Terwilliger TC, Urzhumtsev A, Zwart PH, Adams PD. Towards automated crystallographic structure refinement with phenix.refine. Acta Crystallographica Section D Biological Crystallography. 2012;68:352–367. doi: 10.1107/S0907444912001308. - DOI - PMC - PubMed
    1. Allegretti M, Mills DJ, McMullan G, Kühlbrandt W, Vonck J. Atomic model of the F420-reducing [NiFe] hydrogenase by electron cryo-microscopy using a direct electron detector. eLife. 2014;3:e01963 doi: 10.7554/eLife.01963. - DOI - PMC - PubMed
    1. Bai XC, Fernandez IS, McMullan G, Scheres SH. Ribosome structures to near-atomic resolution from thirty thousand cryo-EM particles. eLife. 2013;2:e00461 doi: 10.7554/eLife.00461. - DOI - PMC - PubMed
    1. Barad BA, Echols N, Wang RY, Cheng Y, DiMaio F, Adams PD, Fraser JS. EMRinger: side chain-directed model and map validation for 3D cryo-electron microscopy. Nature Methods. 2015;12:943–946. doi: 10.1038/nmeth.3541. - DOI - PMC - PubMed
    1. Bartesaghi A, Matthies D, Banerjee S, Merk A, Subramaniam S. Structure of -galactosidase at 3.2-A resolution obtained by cryo-electron microscopy. PNAS. 2014;111:11709–11714. doi: 10.1073/pnas.1402809111. - DOI - PMC - PubMed

Publication types

MeSH terms