A distance geometry-based description and validation of protein main-chain conformation

IUCrJ. 2017 Aug 8;4(Pt 5):657-670. doi: 10.1107/S2052252517008466. eCollection 2017 Sep 1.


Understanding the protein main-chain conformational space forms the basis for the modelling of protein structures and for the validation of models derived from structural biology techniques. Presented here is a novel idea for a three-dimensional distance geometry-based metric to account for the fine details of protein backbone conformations. The metrics are computed for dipeptide units, defined as blocks of Cαi-1-O i-1-Cαi -O i -Cαi+1 atoms, by obtaining the eigenvalues of their Euclidean distance matrices. These were computed for ∼1.3 million dipeptide units collected from nonredundant good-quality structures in the Protein Data Bank and subjected to principal component analysis. The resulting new Euclidean orthogonal three-dimensional space (DipSpace) allows a probabilistic description of protein backbone geometry. The three axes of the DipSpace describe the local extension of the dipeptide unit structure, its twist and its bend. By using a higher-dimensional metric, the method is efficient for the identification of Cα atoms in an unlikely or unusual geometrical environment, and its use for both local and overall validation of protein models is demonstrated. It is also shown, for the example of trypsin proteases, that the detection of unusual conformations that are conserved among the structures of this protein family may indicate geometrically strained residues of potentially functional importance.

Keywords: Euclidean orthogonal three-dimensional space; Ramachandran plot; dipeptide unit; distance matrix; geometrical strain; protein stereochemistry; trypsin proteases; validation.