Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov 19;4(1):1311.
doi: 10.1038/s42003-021-02826-3.

PremPLI: a machine learning model for predicting the effects of missense mutations on protein-ligand interactions

Affiliations

PremPLI: a machine learning model for predicting the effects of missense mutations on protein-ligand interactions

Tingting Sun et al. Commun Biol. .

Abstract

Resistance to small-molecule drugs is the main cause of the failure of therapeutic drugs in clinical practice. Missense mutations altering the binding of ligands to proteins are one of the critical mechanisms that result in genetic disease and drug resistance. Computational methods have made a lot of progress for predicting binding affinity changes and identifying resistance mutations, but their prediction accuracy and speed are still not satisfied and need to be further improved. To address these issues, we introduce a structure-based machine learning method for quantitatively estimating the effects of single mutations on ligand binding affinity changes (named as PremPLI). A comprehensive comparison of the predictive performance of PremPLI with other available methods on two benchmark datasets confirms that our approach performs robustly and presents similar or even higher predictive accuracy than the approaches relying on first-principle statistical mechanics and mixed physics- and knowledge-based potentials while requires much less computational resources. PremPLI can be used for guiding the design of ligand-binding proteins, identifying and understanding disease driver mutations, and finding potential resistance mutations for different drugs. PremPLI is freely available at https://lilab.jysw.suda.edu.cn/research/PremPLI/ and allows to do large-scale mutational scanning.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. A flowchart highlighting important steps in the methodology.
(1) Collecting and processing experimental data used for training, (2) Producing and optimizing 3D structures of wild-type and mutant protein-ligand complexes used for calculating structure-based features, (3) calculating around 400 features and selecting distinct features with remarkable contribution to the quality of the model, and (4) building PremPLI machine learning model using random forest algorithm and trained on experimental data.
Fig. 2
Fig. 2. Overview of the data sets used.
S796: visualization of four types of protein–ligand complex structures and distribution of the number of complexes and unique ligands across different number of mutations are presented. Majority of complexes contain only one single mutation; S144: 3D structure of human Abl kinase with axitinib bound (PDB ID: 4WA9, mutation sites are shown in red), names and chemical structures of eight tyrosine kinase inhibitors (TKIs), and the number of mutations for each type of inhibitor are provided; S99: the number of complexes and mutations for each type of complex structure, statistics of the types of mutations (see Supplementary Fig. 1 for the definition), and distribution of molecular weight and number of rotatable bonds for the ligands in S796 and S99 are shown. See Supplementary Figs. 1 and  2 for more information about the data sets.
Fig. 3
Fig. 3. Pearson correlation coefficients between experimental and calculated changes in binding affinity.
a PremPLI trained and tested on S796 dataset, b ten times 5-fold and 10-fold cross-validations (CV1 and CV2), c leave-one-complex-out validation (CV3), and d leave-one-type-ligand-out validation (CV4). PCC Pearson correlation coefficient, RMSE (kcal mol−1) root-mean-square error.
Fig. 4
Fig. 4. Receiver operating characteristics (ROC) and precision recall (PR) curves for different methods to distinguish resistance from other mutations.
The number of resistance mutations is 19 for both S144 (a) and S99 (b) datasets.
Fig. 5
Fig. 5. Performance for different methods tested on tyrosine kinase inhibitors.
a Pearson correlation coefficient for each tyrosine kinase inhibitor. * and ** indicate statistically significant difference from zero in terms of PCC with p-value < 0.05 and p-value < 0.01 (t-test), respectively. b Performance of six different methods tested on Abl-axitinib complex. PCC and RMSE in red: all 26 mutations; PCC and RMSE in black: mutations with three red dots removed.
Fig. 6
Fig. 6. PremPLI server.
Three steps, (a) input Protein Data Bank (PDB) code or upload coordinate file, (b) select interaction partners and (c) assign mutations, and results pages (d) are provided. “Processing time” refers to the running time of a job without counting the waiting time in the queue.

Similar articles

Cited by

References

    1. Beato M, Chávez S, Truss M. Transcriptional regulation by steroid hormones. Steroids. 1996;61:240–251. - PubMed
    1. Ronnett GV, Moon C. G proteins and olfactory signal transduction. Annu. Rev. Physiol. 2002;64:189–222. - PubMed
    1. Missale C, Nash SR, Robinson SW, Jaber M, Caron MG. Dopamine receptors: from structure to function. Physiol. Rev. 1998;78:189–225. - PubMed
    1. Nemethova M, et al. Twelve novel HGD gene variants identified in 99 alkaptonuria patients: focus on ‘black bone disease’ in Italy. Eur. J. Hum. Genet. 2016;24:66–72. - PMC - PubMed
    1. Kim P, Zhao J, Lu P, Zhao Z. mutLBSgeneDB: mutated ligand binding site gene DataBase. Nucleic Acids Res. 2017;45:D256–D263. - PMC - PubMed

Publication types