Comparative assessment of strategies to identify similar ligand-binding pockets in proteins

BMC Bioinformatics. 2018 Mar 9;19(1):91. doi: 10.1186/s12859-018-2109-2.

Abstract

Background: Detecting similar ligand-binding sites in globally unrelated proteins has a wide range of applications in modern drug discovery, including drug repurposing, the prediction of side effects, and drug-target interactions. Although a number of techniques to compare binding pockets have been developed, this problem still poses significant challenges.

Results: We evaluate the performance of three algorithms to calculate similarities between ligand-binding sites, APoc, SiteEngine, and G-LoSA. Our assessment considers not only the capabilities to identify similar pockets and to construct accurate local alignments, but also the dependence of these alignments on the sequence order. We point out certain drawbacks of previously compiled datasets, such as the inclusion of structurally similar proteins, leading to an overestimated performance. To address these issues, a rigorous procedure to prepare unbiased, high-quality benchmarking sets is proposed. Further, we conduct a comparative assessment of techniques directly aligning binding pockets to indirect strategies employing structure-based virtual screening with AutoDock Vina and rDock.

Conclusions: Thorough benchmarks reveal that G-LoSA offers a fairly robust overall performance, whereas the accuracy of APoc and SiteEngine is satisfactory only against easy datasets. Moreover, combining various algorithms into a meta-predictor improves the performance of existing methods to detect similar binding sites in unrelated proteins by 5-10%. All data reported in this paper are freely available at https://osf.io/6ngbs/ .

Keywords: Drug design; Drug repositioning; Drug repurposing; Drug side-effect; Ligand-binding sites; Pocket comparison; Structure-based virtual screening.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Area Under Curve
  • Binding Sites
  • Databases, Protein
  • Drug Discovery
  • Ligands
  • Models, Molecular
  • Protein Binding
  • Protein Conformation
  • Proteins / chemistry
  • Proteins / metabolism*
  • ROC Curve
  • Sequence Alignment

Substances

  • Ligands
  • Proteins