A Searchable Database of Crystallization Cocktails in the PDB: Analyzing the Chemical Condition Space

Patterns (N Y). 2020 Jul 10;1(4):100024. doi: 10.1016/j.patter.2020.100024. Epub 2020 Apr 28.

Abstract

Nearly 90% of structural models in the Protein Data Bank (PDB), the central resource worldwide for three-dimensional structural information, are currently derived from macromolecular crystallography (MX). A major bottleneck in determining MX structures is finding conditions in which a biomolecule will crystallize. Here, we present a searchable database of the chemicals associated with successful crystallization experiments from the PDB. We use these data to examine the relationship between protein secondary structure and average molecular weight of polyethylene glycol and to investigate patterns in crystallization conditions. Our analyses reveal striking patterns of both redundancy of chemical compositions in crystallization experiments and extreme sparsity of specific chemical combinations, underscoring the challenges faced in generating predictive models for de novo optimal crystallization experiments.