MolProbity: More and better reference data for improved all-atom structure validation

Protein Sci. 2018 Jan;27(1):293-315. doi: 10.1002/pro.3330. Epub 2017 Nov 27.

Abstract

This paper describes the current update on macromolecular model validation services that are provided at the MolProbity website, emphasizing changes and additions since the previous review in 2010. There have been many infrastructure improvements, including rewrite of previous Java utilities to now use existing or newly written Python utilities in the open-source CCTBX portion of the Phenix software system. This improves long-term maintainability and enhances the thorough integration of MolProbity-style validation within Phenix. There is now a complete MolProbity mirror site at http://molprobity.manchester.ac.uk. GitHub serves our open-source code, reference datasets, and the resulting multi-dimensional distributions that define most validation criteria. Coordinate output after Asn/Gln/His "flip" correction is now more idealized, since the post-refinement step has apparently often been skipped in the past. Two distinct sets of heavy-atom-to-hydrogen distances and accompanying van der Waals radii have been researched and improved in accuracy, one for the electron-cloud-center positions suitable for X-ray crystallography and one for nuclear positions. New validations include messages at input about problem-causing format irregularities, updates of Ramachandran and rotamer criteria from the million quality-filtered residues in a new reference dataset, the CaBLAM Cα-CO virtual-angle analysis of backbone and secondary structure for cryoEM or low-resolution X-ray, and flagging of the very rare cis-nonProline and twisted peptides which have recently been greatly overused. Due to wide application of MolProbity validation and corrections by the research community, in Phenix, and at the worldwide Protein Data Bank, newly deposited structures have continued to improve greatly as measured by MolProbity's unique all-atom clashscore.

Keywords: Asn/Gln/His flip; CCTBX; CaBLAM; Top8000; all-atom contact analysis; cis non-proline; electron-cloud hydrogen position.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Databases, Protein*
  • Models, Molecular*
  • Programming Languages*
  • Proteins / chemistry*
  • Proteins / genetics

Substances

  • Proteins

Associated data

  • PDB/4pr6
  • PDB/1gwe
  • PDB/1yk4
  • PDB/1bkr
  • PDB/1xk8
  • PDB/1s72
  • PDB/2o01
  • PDB/1qw9
  • PDB/3gx5