One of the main barriers to accurate computational protein structure prediction is searching the vast space of protein conformations. Distance restraints or inter-residue contacts have been used to reduce this search space, easing the discovery of the correct folded state. It has been suggested that about 1 contact for every 12 residues may be sufficient to predict structure at fold level accuracy. Here, we use coarse-grained structure-based models in conjunction with molecular dynamics simulations to examine this empirical prediction. We generate sparse contact maps for 15 proteins of varying sequence lengths and topologies and find that given perfect secondary-structural information, a small fraction of the native contact map (5%-10%) suffices to fold proteins to their correct native states. We also find that different sparse maps are not equivalent and we make several observations about the type of maps that are successful at such structure prediction. Long range contacts are found to encode more information than shorter range ones, especially for α and αβ-proteins. However, this distinction reduces for β-proteins. Choosing contacts that are a consensus from successful maps gives predictive sparse maps as does choosing contacts that are well spread out over the protein structure. Additionally, the folding of proteins can also be used to choose predictive sparse maps. Overall, we conclude that structure-based models can be used to understand the efficacy of structure-prediction restraints and could, in future, be tuned to include specific force-field interactions, secondary structure errors and noise in the sparse maps.
Keywords: coarse-grained structure-based models; conformational sampling; protein design; protein folding; sparse contact maps; structural restraints.
© 2017 Wiley Periodicals, Inc.