Protein aggregation is a phenomenon that has attracted considerable attention within the pharmaceutical industry from both a developability standpoint (to ensure stability of protein formulations) and from a research perspective for neurodegenerative diseases. Experimental identification of aggregation behavior in proteins can be expensive; and hence, the development of accurate computational approaches is crucial. The existing methods for predicting protein aggregation rely mostly on the primary sequence and are typically trained on amyloid-like proteins. However, the training bias toward beta amyloid peptides may worsen prediction accuracy of such models when applied to larger protein systems. Here, we present a novel algorithm to identify aggregation-prone regions in proteins termed "AggScore" that is based entirely on three-dimensional structure input. The method uses the distribution of hydrophobic and electrostatic patches on the surface of the protein, factoring in the intensity and relative orientation of the respective surface patches into an aggregation propensity function that has been trained on a benchmark set of 31 adnectin proteins. AggScore can accurately identify aggregation-prone regions in several well-studied proteins and also reliably predict changes in aggregation behavior upon residue mutation. The method is agnostic to an amyloid-specific aggregation context and thus may be applied to globular proteins, small peptides and antibodies.
Keywords: adnectin; aggregation propensity; aggregation score; amyloid beta; antibody; electrostatic patches; hydrophobic patches; protein aggregation.
© 2018 Wiley Periodicals, Inc.