Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Feb 1;74(Pt 2):132-142.
doi: 10.1107/S2059798317009834. Epub 2018 Feb 1.

Model validation: local diagnosis, correction and when to quit

Affiliations

Model validation: local diagnosis, correction and when to quit

Jane S Richardson et al. Acta Crystallogr D Struct Biol. .

Abstract

Traditionally, validation was considered to be a final gatekeeping function, but refinement is smoother and results are better if model validation actively guides corrections throughout structure solution. This shifts emphasis from global to local measures: primarily geometry, conformations and sterics. A fit into the wrong local minimum conformation usually produces outliers in multiple measures. Moving to the right local minimum should be prioritized, rather than small shifts across arbitrary borderlines. Steric criteria work best with all explicit H atoms. `Backrub' motions should be used for side chains and `P-perp' diagnostics to correct ribose puckers. A `water' may actually be an ion, a relic of misfitting or an unmodeled alternate. Beware of wishful thinking in modeling ligands. At high resolution, internally consistent alternate conformations should be modeled and geometry in poor density should not be downweighted. At low resolution, CaBLAM should be used to diagnose protein secondary structure and ERRASER to correct RNA backbone. All atoms should not be forced inside density, beware of sequence misalignment, and very rare conformations such as cis-non-Pro peptides should be avoided. Automation continues to improve, but the crystallographer still must look at each outlier, in the context of density, and correct most of them. For the valid few with unambiguous density and something that is holding them in place, a functional reason should be sought. The expectation is a few outliers, not zero.

Keywords: MolProbity; all-atom contacts; likelihood; outlier correction; structure validation.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The H atoms are really there. Tyr13 in PDB entry 1yk4 for rubredoxin at 0.69 Å resolution, with 2mF oDF c map (black) contoured at 1.5σ and mF o − DF c (blue) at 2.8σ. All H atoms are visible, including the OH donor to a nearby backbone O.
Figure 2
Figure 2
(a) The process of defining all-atom contacts. Small, pale dots are the van der Waals envelope of covalently bonded atoms, including H atoms. A probe sphere of 0.25 Å radius (gray) is rolled on this surface, leaving color-coded probe dots where it intersects with the surface of another atom. Favorable van der Waals contacts from just touching to 0.5 Å apart produce paired patches of green (close) and blue (more distant) contacts, while overlaps are either favorable hydrogen bonds between donor–acceptor atom pairs or repulsive overlaps in warm colors, with serious clashes of ≥0.4 Å shown by hot-pink spikes. (b) A definitive assignment of an amide ‘flip’ for Gln115 in PDB entry 1gk8. The larger NH2 group gives clashes and there are no hydrogen bonds in the original position, while the flip shows three hydrogen bonds and no clashes. The REDUCE flip is confirmed as correct by the higher electron density in the assigned O position.
Figure 3
Figure 3
Plot of average clashscore versus year for mid-resolution PDB depositions worldwide, showing a steady improvement since the introduction of MolProbity.
Figure 4
Figure 4
Key to the markup for various categories of validation outliers in MolProbity’s three-dimensional graphics, as also seen in the figures here.
Figure 5
Figure 5
The six plots of data distributions and contours used for current Ramachandran validation. The million residues of quality-filtered data from the Top8000 data set are color-coded in 0.1° pixels, from gray for one data point to bright yellow for the highest density (30–45 data points per pixel in the general distribution). For Gly the outer contours are symmetric but the data are not, since Gly serves different functional roles in the α and the Lα regions.
Figure 6
Figure 6
(a) Data-point distribution and 2% contour in the χ2 (into page), χ3 (vertical) and χ4 (near-horizontal) dimensions for quality-filtered Arg residues with χ1 trans. (b) A valid rotamer outlier: Gln321 in PDB entry 1n83, with its eclipsed χ2 (−119°) held by three side-chain hydrogen bonds and clearly validated by the electron density.
Figure 7
Figure 7
(a) An easy outlier correction. The methyl of Mse351 in PDB entry 1j58, as deposited, shows several bad clashes and the side chain is a rotamer outlier. The favorable mmm rotamer fixes both problems. (b) The backrub motion, shown as a schematic of the small-amplitude backrub rotation around the Cα i − 1 to i + 1 axis with leverage on the Cα—Cβ direction and on side-chain contacts, and an example of alternate conformations in Ile47 of the 1n9b β-sheet, where the backrub shift allows good packing in two distinct rotamers.
Figure 8
Figure 8
Interpreting ‘water’ peaks. (a) When they clash with nonpolar atoms then they are likely to be the next atoms in an unmodeled alternate (b), as shown for Asp9 in PDB entry 1eb6. (c) Do not let a ‘water’ push a side-chain atom out of its density, as happened for Ile195 in PDB entry 3js8. (d) A good water peak should show density separated from other atoms, with at least one polar interaction at good hydrogen-bond distance; here, HOH 543 in PDB entry 3js8 makes two good hydrogen bonds. Contours are at 1.2σ.
Figure 9
Figure 9
Handling alternate conformations. (a) Use peak heights to assign consistent alternate IDs, including partial occupancies for interacting waters, which was not performed here. Leu105 from PDB entry 1gwe at 1.2 and 3σ. (b) When any alternate backbone atom is widely separated, do not rejoin alternates until the flanking Cα atoms in order to avoid bad geometry. Here, for Asp42 from PDB entry 1w0n, there are bond-length outliers up to 8σ and bond-angle outliers up to 12σ.
Figure 10
Figure 10
Difficulties at low resolution. (a) Higher resolution shows that this is a regular β-strand with no outliers or clashes, but in (b) the backbone CO directions are misoriented because they are not observed, and the side chains are pulled inwards towards density nubbins. Contours are at 1.2σ. (c) Localized outliers for a long sequence misalignment in 70S ribosomal protein L27 at 3.2 Å resolution (PDB entry 3i1n). (d) After rebuilding of the one- to three-residue sequence shifts in the improved PDB entry 4gd1 (Dunkle et al., 2011 ▸).
Figure 11
Figure 11
CaBLAM low-resolution diagnosis. (a) Plot of CO versus Cα-in virtual dihedrals, with contours for the good β-sheet reference data and white data points for examples that have three adjacent CO bonds parallel rather than alternating. (b) CaBLAM scoring for three such outliers, showing definitively that they should be fitted as regular β structure. PDB entry 3i1n; contours at 1.2σ.
Figure 12
Figure 12
RNA backbone conformers and corrections. (a) Definition of the suite divisions (sugar-to-sugar) of RNA backbone. (b) Using two-character suite names to describe the backbone conformation of the GNRA tetraloop. (c) Original conformation of two touching loops in the riboswitch (PDB entry 2gis) at 2.9 Å resolution, with clashes, bad ribose puckers and four out of five outlier suite conformers(!!). (d) After correction using ERRASER.
Figure 13
Figure 13
Use and overuse of very rare cis-non-Pro peptides. (a) A clear, genuine cis-non-Pro in PDB entry 2ddx at 0.86 Å resolution, flagged by the seagreen trapezoid. (b) Time-course plot of the epidemic overuse of cis-non-Pro peptides. (c) An example of how cis peptides can fit better than trans peptides into patchy, poor electron density at 1.2σ. PDB entry 2j82, 1092 loop.
Figure 14
Figure 14
Likelihood-based choice of cis versus trans peptides. (a) Original cis model of Lys–Gly270 in PDB entry 2cn3, with a decent fit to the contours at 1.2 and 3σ. (b) Model rebuilt as trans, with even better fit, a hydrogen bond rather than a clash, and a log-likelihood gain of 75.6, plus eight units better log prior probability.
Figure 15
Figure 15
A take-home message.

Similar articles

Cited by

References

    1. Adams, P. D. et al. (2010). Acta Cryst. D66, 213–221. - PubMed
    1. Adams, P. D., Baker, D., Brunger, A. T., Das, R., DiMaio, F., Read, R. J., Richardson, D. C., Richardson, J. S. & Terwilliger, T. C. (2013). Annu. Rev. Biophys. 42, 265–287. - PMC - PubMed
    1. Arendall, W. B. III, Tempel, W., Richardson, J. S., Zhou, W., Wang, S., Davis, I. W., Liu, Z.-J., Rose, J. P., Carson, W. M., Luo, M., Richardson, D. C. & Wang, B.-C. (2005). J. Struct. Funct. Genomics, 6, 1–11. - PubMed
    1. Berkholz, D. S., Driggers, C. D., Shapovalov, M. V., Dunbrack, R. L. Jr & Karplus, P. A. (2012). Proc. Natl Acad. Sci. USA, 109, 449–453. - PMC - PubMed
    1. Berkholz, D. S., Shapovalov, M. V., Dunbrack, R. L. Jr & Karplus, P. A. (2009). Structure, 17, 1316–1325. - PMC - PubMed

Publication types