Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jan;30(1):250-261.
doi: 10.1002/pro.3995. Epub 2020 Dec 3.

Using Integrative Modeling Platform to compute, validate, and archive a model of a protein complex structure

Affiliations

Using Integrative Modeling Platform to compute, validate, and archive a model of a protein complex structure

Daniel J Saltzberg et al. Protein Sci. 2021 Jan.

Abstract

Biology is advanced by producing structural models of biological systems, such as protein complexes. Some systems are recalcitrant to traditional structure determination methods. In such cases, it may still be possible to produce useful models by integrative structure determination that depends on simultaneous use of multiple types of data. An ensemble of models that are sufficiently consistent with the data is produced by a structural sampling method guided by a data-dependent scoring function. The variation in the ensemble of models quantified the uncertainty of the structure, generally resulting from the uncertainty in the input information and actual structural heterogeneity in the samples used to produce the data. Here, we describe how to generate, assess, and interpret ensembles of integrative structural models using our open source Integrative Modeling Platform program (https://integrativemodeling.org).

Keywords: biophysics; chemical cross-linking; electron microscopy; integrative structure modeling; model validation; protein complexes; structural biology.

PubMed Disclaimer

Figures

FIGURE 1
FIGURE 1
The four stages of integrative modeling. The integrative structure determination procedure proceeds through four stages. First, we collect all information describing the system, including experimental data and physical laws. Second, the representation of the system components is chosen, and each piece of input information is translated into a set of spatial restraints. Third, alternative configurations of the system components are sampled. Fourth, the ensemble of models resulting from sampling are filtered by their fit to the input data, the sampling and model precision computed, and the resulting model ensembles validated against information used and not used in modeling. Should the model not be deemed satisfactory, by either being too imprecise or poorly fit, the process can be iterated, adding more information or increasing sampling, until a satisfactory model is obtained. Steps in the fourth stage covered in this tutorial are outlined with thick lines
FIGURE 2
FIGURE 2
Data used in constructing RNA‐Polymerase II stalk model. (a) Primary sequences of all subunits in the FASTA format. (b) Chemical cross‐linking data, which yields a list of proximate residue pairs. (c) A 3D negative‐stain EM density map of the entire complex. (d) X‐ray crystal structures of each of the subunits. Figure published in a previous Protein Science tutorial 7
FIGURE 3
FIGURE 3
Scores plots for undersampled and extensively sampled systems. Histograms of individual scores are on the diagonal and 2D score plots are on the off‐diagonals. The colors in the 2D plots represent the different clusters. (a) The undersampled modeling scores plots shows two distinct clusters of scores, indicating that the independent simulations did not converge on the same solution. (b) The plot of the extensively sampled modeling shows a continuous distribution of scores that suggests more complete sampling
FIGURE 4
FIGURE 4
Evaluation of crosslink satisfaction. (a) Histogram of mean crosslink distances for Cluster 0 showing that the majority of observed crosslinks are satisfied at the 30 Å cutoff value (yellow line). (b) Individual crosslink distance distributions for a subset of crosslinks used to model the complex. The identity of each crosslink is noted in the X‐axis label in the format “Protein1:Residue1 | Protein2:Residue2”. The total range of crosslink distances in the cluster are shown by the whiskers while the 25th to 75th percentile is represented by the blue boxed. The median crosslink distance is represented by an orange line. The 30 Å cutoff value is shown in the green line. Crosslinks that show a fixed distance (orange dashes) have both residue endpoints located in the rigid body subcomplex of this model. The figure containing all crosslink distributions is found in ./rnapolii/analysis /model_analysis/ plot_XLs_distance_distributions_cl1.pdf
FIGURE 5
FIGURE 5
Determination of sampling precision and visual validation for under and rigorously sampled models. The results of the three sampling convergence tests at multiple clustering thresholds are plotted in panel (a) for the undersampled set of models, showing a sampling precision of 12.6 Å and (b) for the rigorously sampled set, showing a sampling precision of 8.6 Å. C) Comparison of the two independent sets of in localization densities of Rpb4 (green volume) and Rpb7 (purple volume) for (c) the undersampled model set shows significant differences between the two, while for D) the pair generated from the extensively sampled set appears identical
FIGURE 6
FIGURE 6
Integrative model of the RNA Polymerase II stalk. The final integrative model for the RNA polymerase II stalk, determined to 8.7 Å precision, is visualized by the localization densities of subunits Rpb4 (green volume) and Rpb7 (purple volume) in relation to the rigid structure of the polymerase base (gray volume). A single representative model of the ensemble, the centroid model, for the two moving components is shown in sphere representation with the same colors. Figure prepared using Chimera

Similar articles

Cited by

References

    1. Lasker K, Forster F, Bohn S, et al. Molecular architecture of the 26S proteasome holocomplex determined by an integrative approach. Proc Natl Acad Sci U S A. 2012;109:1380–1387. - PMC - PubMed
    1. Viswanath S, Bonomi M, Kim SJ, et al. The molecular architecture of the yeast spindle pole body core determined by Bayesian integrative modeling. Mol Biol Cell. 2017;28:3298–3314. - PMC - PubMed
    1. Kim SJ, Fernandez‐Martinez J, Nudelman I, et al. Integrative structure and functional anatomy of a nuclear pore complex. Nature. 2018;555:475–482. - PMC - PubMed
    1. Molnar K, Bonomi M, Pellarin R, et al. Cys‐scanning disulfide crosslinking and Bayesian modeling probe the transmembrane signaling mechanism of the histidine kinase, PhoQ. Structure. 2014;22:1239–1251. - PMC - PubMed
    1. Rout MP, Sali A. Principles for integrative structural biology studies. Cell. 2019;177:1384–1403. - PMC - PubMed

Publication types

Substances