Protein-ligand binding affinity prediction exploiting sequence constituent homology

Abbi Abdel-Rehim; Oghenejokpeme Orhobor; Lou Hang; Hao Ni; Ross D King

doi:10.1093/bioinformatics/btad502

Protein-ligand binding affinity prediction exploiting sequence constituent homology

Bioinformatics. 2023 Aug 1;39(8):btad502. doi: 10.1093/bioinformatics/btad502.

Authors

Abbi Abdel-Rehim¹, Oghenejokpeme Orhobor², Lou Hang³, Hao Ni^{3

4}, Ross D King^{1

4

5

6}

Affiliations

¹ Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge CB3 0AS, United Kingdom.
² The National Institute of Agricultural Botany, Cambridge CB3 0LE, United Kingdom.
³ Department of Mathematics, University College London, London WC1H 0AY, United Kingdom.
⁴ The Alan Turing Institute, London NW1 2DB, United Kingdom.
⁵ Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg 412 96, Sweden.
⁶ Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg 412 96, Sweden.

Abstract

Motivation: Molecular docking is a commonly used approach for estimating binding conformations and their resultant binding affinities. Machine learning has been successfully deployed to enhance such affinity estimations. Many methods of varying complexity have been developed making use of some or all the spatial and categorical information available in these structures. The evaluation of such methods has mainly been carried out using datasets from PDBbind. Particularly the Comparative Assessment of Scoring Functions (CASF) 2007, 2013, and 2016 datasets with dedicated test sets. This work demonstrates that only a small number of simple descriptors is necessary to efficiently estimate binding affinity for these complexes without the need to know the exact binding conformation of a ligand.

Results: The developed approach of using a small number of ligand and protein descriptors in conjunction with gradient boosting trees demonstrates high performance on the CASF datasets. This includes the commonly used benchmark CASF2016 where it appears to perform better than any other approach. This methodology is also useful for datasets where the spatial relationship between the ligand and protein is unknown as demonstrated using a large ChEMBL-derived dataset.

Availability and implementation: Code and data uploaded to https://github.com/abbiAR/PLBAffinity.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Ligands
Machine Learning*
Molecular Docking Simulation
Protein Binding
Proteins* / chemistry

Substances

Ligands
Proteins