Prediction of protein-ligand interactions from paired protein sequence motifs and ligand substructures

Pac Symp Biocomput. 2018;23:20-31.


Identification of small molecule ligands that bind to proteins is a critical step in drug discovery. Computational methods have been developed to accelerate the prediction of protein-ligand binding, but often depend on 3D protein structures. As only a limited number of protein 3D structures have been resolved, the ability to predict protein-ligand interactions without relying on a 3D representation would be highly valuable. We use an interpretable confidence-rated boosting algorithm to predict protein-ligand interactions with high accuracy from ligand chemical substructures and protein 1D sequence motifs, without relying on 3D protein structures. We compare several protein motif definitions, assess generalization of our model's predictions to unseen proteins and ligands, demonstrate recovery of well established interactions and identify globally predictive protein-ligand motif pairs. By bridging biological and chemical perspectives, we demonstrate that it is possible to predict protein-ligand interactions using only motif-based features and that interpretation of these features can reveal new insights into the molecular mechanics underlying each interaction. Our work also lays a foundation to explore more predictive feature sets and sophisticated machine learning approaches as well as other applications, such as predicting unintended interactions or the effects of mutations.

Publication types

  • Validation Study

MeSH terms

  • Algorithms
  • Amino Acid Motifs*
  • Computational Biology
  • Databases, Protein
  • Drug Discovery / methods*
  • Drug Discovery / statistics & numerical data
  • Humans
  • Ligands
  • Machine Learning
  • Models, Chemical
  • Molecular Structure
  • Protein Binding
  • Proteins / chemistry
  • Quantitative Structure-Activity Relationship


  • Ligands
  • Proteins