ppdx: Automated modeling of protein-protein interaction descriptors for use with machine learning

J Comput Chem. 2022 Aug 5. doi: 10.1002/jcc.26974. Online ahead of print.


This paper describes ppdx, a python workflow tool that combines protein sequence alignment, homology modeling, and structural refinement, to compute a broad array of descriptors for characterizing protein-protein interactions. The descriptors can be used to predict various properties of interest, such as protein-protein binding affinities, or inhibitory concentrations (IC50 ), using approaches that range from simple regression to more complex machine learning models. The software is highly modular. It supports different protocols for generating structures, and 95 descriptors can be currently computed. More protocols and descriptors can be easily added. The implementation is highly parallel and can fully exploit the available cores in a single workstation, or multiple nodes on a supercomputer, allowing many systems to be analyzed simultaneously. As an illustrative application, ppdx is used to parametrize a model that predicts the IC50 of a set of antigens and a class of antibodies directed to the influenza hemagglutinin stalk.

Keywords: binding affinity; machine learning; protein interaction descriptors; protein-protein interactions; scoring functions.