Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;9(9):e1003216.
doi: 10.1371/journal.pcbi.1003216. Epub 2013 Sep 5.

Characterizing changes in the rate of protein-protein dissociation upon interface mutation using hotspot energy and organization

Affiliations

Characterizing changes in the rate of protein-protein dissociation upon interface mutation using hotspot energy and organization

Rudi Agius et al. PLoS Comput Biol. 2013.

Abstract

Predicting the effects of mutations on the kinetic rate constants of protein-protein interactions is central to both the modeling of complex diseases and the design of effective peptide drug inhibitors. However, while most studies have concentrated on the determination of association rate constants, dissociation rates have received less attention. In this work we take a novel approach by relating the changes in dissociation rates upon mutation to the energetics and architecture of hotspots and hotregions, by performing alanine scans pre- and post-mutation. From these scans, we design a set of descriptors that capture the change in hotspot energy and distribution. The method is benchmarked on 713 kinetically characterized mutations from the SKEMPI database. Our investigations show that, with the use of hotspot descriptors, energies from single-point alanine mutations may be used for the estimation of off-rate mutations to any residue type and also multi-point mutations. A number of machine learning models are built from a combination of molecular and hotspot descriptors, with the best models achieving a Pearson's Correlation Coefficient of 0.79 with experimental off-rates and a Matthew's Correlation Coefficient of 0.6 in the detection of rare stabilizing mutations. Using specialized feature selection models we identify descriptors that are highly specific and, conversely, broadly important to predicting the effects of different classes of mutations, interface regions and complexes. Our results also indicate that the distribution of the critical stability regions across protein-protein interfaces is a function of complex size more strongly than interface area. In addition, mutations at the rim are critical for the stability of small complexes, but consistently harder to characterize. The relationship between hotregion size and the dissociation rate is also investigated and, using hotspot descriptors which model cooperative effects within hotregions, we show how the contribution of hotregions of different sizes, changes under different cooperative effects.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Off-rate estimation using hotspot energies and organization.
In this work we generate a set of hotspot descriptors for characterizing off-rate changes upon mutation. The hotspot descriptors use single-point alanine ΔΔGs from computational alanine-scans generated using hotspot prediction algorithms, to predict changes in off-rate upon single-point and multi-point mutations to all residue types. To do so, for a given wild-type complex structure, the interface is scanned for hotspots using a hotspot prediction algorithm. The single-point alanine ΔΔGs from the scan are extracted and stored. Next, the structural mutation in question is applied and the mutated interface re-scanned for hotspots. This generates a new set of single-point alanine ΔΔGs for the mutated interface. Note that the mutation in question may also affect the hotspot energies of other neighboring residues which are not mutated. The two sets of ΔΔGs are then used to generate a set of hotspot descriptors, where the final hotspot descriptor value is the change in the descriptor's value from mutant to wild-type. For example in the case of Int_HS_Energy, the final value is the change in the sum of the ΔΔGs, of all hotspot residues, pre- and post-mutation. Hotspots are also categorized into core, rim, support and hotregions. This enables us to investigate and account for cooperative effects within hotregions and to identify differences in regions critical for stability, both on complexes of different size and interface area.
Figure 2
Figure 2. Relationship of off-rate changes upon mutation with change in binding free energy and change in interface hotspot energy.
(A) The relationship between experimental values for Δlog10(koff) and ΔΔG for all the 713 mutations in the SKEMPI off-rate dataset. (B) The relationship between changes in interface hotspot energies, as predicted by RFSpot_KFC2 hotspot predictor, and change in Δlog10(koff) for all the 713 mutations in the SKEMPI off-rate dataset. Note that 50% of off-rate mutants in this dataset involve mutations to non-alanine residues and include multi-point mutants. In turn Int_HS_Energy characterizes these changes with the use of single-point alanine ΔΔGs as highlighted in Figure 1.
Figure 3
Figure 3. Hotspot and molecular descriptors for estimating change in off-rate.
The hotspot descriptors designed in this work are benchmarked against a set of 110 molecular descriptors; both in their ability to estimate Δlog10(koff) and in their ability to detect stabilizing mutations of Δlog10( koff ) <−1. The performance measures shown here enable us to assess the raw predictive power of the descriptors independent of any learning models. Green and black bars highlight descriptors from the hotspot and molecular descriptor sets respectively. (A) Comparison of the distribution of the absolute PCC values for the hotspot descriptors designed in this work against that for the molecular descriptors. The related list of descriptor names and their respective PCCs is found in Text S5. (B) Top 10 hotspot descriptors and top 10 molecular descriptor according to absolute PCC with experimental Δlog10(koff). (C) Mann Whitney U-Test rankings for all descriptors where values are ranked according to −log10(pval) and represent the discrimination ability of the descriptors for the detection of stabilizing mutants (Δlog10( koff ) <−1) from neutral to destabilizing mutants (Δlog10( koff ) >0) (Referred to as CDS1). This dataset contains 31 stabilizing mutants and 503 neutral to destabilizing mutants. (D) Matthew's Correlation Coefficient (MCC) rankings for all descriptors on same dataset. (E) and (F) are identical to (C) and (D) except that results are for off-rates that satisfy |Δlog10( koff )| >1. This dataset contains 31 stabilizing mutants and 213 destabilizing mutants (referred to as CDS2).
Figure 4
Figure 4. Hotspot and molecular descriptor scatter plots.
The relationship between experimental values for Δlog10(koff) and (A) hotspot descriptors showing highest correlation with Δlog10(koff) (SuppHSEnergyKFC2a - changes in hotspot energies in the support region as predicted by KFC2a [30]), (B) molecular descriptor showing highest correlation with Δlog10(koff) (AP_MPS - the DARS atomic potential [54]), (C) top performing hotspot descriptor for the detection of stabilizing mutants (HSEner_PosCoopRFSpot – changes in hotspot energies on accounting for positive cooperativity in hotregions) and (D) top performing molecular descriptor for the detection of stabilizing mutants (CP_TB – coarse grained protein-protein docking potential).
Figure 5
Figure 5. Off-rate prediction models using hotspot and molecular descriptors.
A number of RF regression and classification models are built using different sets of hotspot and molecular descriptors. The prediction accuracy is also assessed on subsets of mutations defined as data regions. The data regions enable us to identify classes of mutations, which are consistently harder to characterize, data set biases and prediction patterns. (A) PCC values for off-rate model predictions with Δlog10(koff). Models use hotspot descriptors, or a combination of hotspot and molecular descriptors. The different methods indicate the hotspot prediction method by which the hotspot descriptors where generated from. (B) Data region analysis of predictions from each model. The prediction from each model are subset into the respective categories shown on the x-axis and values in matrix show PCC achieved by the given model for the given data region. (C) MCC values for off-rate classifier model predictions for classification data sets CDS1 in blue and CDS2 in red. CDS1 includes neutral mutations whereas CDS2 excludes neutral mutations; hence the detection of stabilizing mutants is enhanced in the latter, though results for CDS1 are more relevant for interface design scenarios. (D–F) are similar to (A–C) except that off-rate prediction models using subsets of molecular descriptors are investigated. CP – Coarse-Grain Potentials; AP – Atomic-Based Potentials; CP-AP – All Statistical Potentials; PB – Physics Based Energy Terms. As a benchmark comparison, results for RFSpot_KFC2Off-Rate (best performing off-rate predictor using hotspot descriptors) and RF_Spot_KFC2Off-Rate+MOL (best performing off-rate predictor using hotspot and molecular descriptors) are also included in (D–F).
Figure 6
Figure 6. Off-rate prediction model scatter plots.
The relationship between experimental values for Δlog10(koff) and predicted values for Δlog10(koff) with (A) RFSpot_KFC2Off-Rate+MOL, best performing off-rate prediction model combining hotspot and molecular descriptors. Hotspot descriptors for this model are generated using the RFSpot_KFC2 hotspot prediction algorithm. (B) RFSpot_KFC2Off-Rate+MOL, best performing off-rate prediction model using only hotspot descriptors. Hotspot descriptors for this model are again generated using the RFSpot_KFC2 hotspot prediction algorithm. (C) MolecularOff-Rate, off-rate prediction model using molecular descriptors. The addition of hotspot descriptors as observed in (A) to molecular descriptor model as shown in (B) notably improves the prediction of stabilizing mutants, which are all found in the lower left quadrant for RFSpotKFC2Off-Rate+MOL.
Figure 7
Figure 7. Detection of rare complex stabilizing mutations using off-rate classification models.
(A) Ranked list of 31 stabilizing mutations (Δlog10(koff) <−1) in SKEMPI off-rate dataset. The list is ranked according to the number of off-rate prediction classification models that detect the mutation in question as stabilizing. Detections per model (B) are highlighted in white, and non-detections highlighted in black. The lower portion of (A) is dominated by single-point mutations to alanine residues, which suggests that the stabilizing effects of these mutations, as opposed to their more common neutralizing/destabilizing effects, are much harder to characterize.
Figure 8
Figure 8. Specialized feature selection models and descriptor-data region networks.
Feature selection models using a genetic algorithm are run for different data regions of the off-rate dataset for which both linear (using Linear Regression) and non-linear (using SVM regression) models are investigated. For each data region, the GA-FS is run 50 times designed to find an optimal feature set of size 5. Initial features available in the population are the 110 molecular descriptors and 16 hotspot descriptors generated by RFspot_KFC2. An inner-cross validation loop is used as a scoring function for driving the feature selection whereas and outer-cross validation loop is used to assess the model prediction accuracy. (A) and (B) shows the importance of the most selected features for each data region. The features shown are those that are part of the final model for any data region on more than 50% of the GA-FS runs, and the color bar displays this percentage. The features on the y-axis are ordered as: coarse-grain potentials, atomic-based potentials, physics-based energy terms and hotspot descriptors. (C) and (D) are descriptor-data region networks for (A) and (B) respectively. Circled nodes represent data regions and square nodes represent features; therefore, only edges between circle and square nodes are present. An edge is present if the feature is in the final model for the given data region in more than 50% of the GA-FS runs (dotted edge), between 70–90% of the GA-FS runs (normal edge), more than 90% of the GA-FS runs (bold edge). Coarse-grain potentials (blue), atomic-based potentials (yellow), physics-based energy terms (green), hotspot descriptors (pink) and data regions (gray). From the descriptor-data region networks, descriptors highly specific to certain classes of off-rate mutations can be observed. Conversely, as in the case of the GS-FS (SVM) data region network, a cluster of broadly-predictive hotspot descriptors is also shown. (E) Mean PCC of the optimal models found by the GA-FS runs for each data region. For comparison, PCC results on the data regions results are also shown for RFSpot_KFC2Off-Rate+Mol. Note that the latter model is trained on all 713 off-rate mutations, and the predictions are separated post prediction into data regions and analyzed for their PCC. This effectively compares the predictions of specialized models vs. one-fits-all model. Though we find no evidence that specialized models perform better than a one-fits-all model, certain subsets of mutations, such as those at the rim regions, show notable improvements when a specialized model is employed.
Figure 9
Figure 9. Stability regions, interface-area and complex-size.
The changes in hotspot energies upon mutation are assessed at three interface regions, which enable us to explore changes in the distribution of stability for complexes of different size and interface-area. CORE, RIM and SUPP represent the PCCs of CoreHSEnergy/RimHSEnergy/SuppHSEnergy averaged for the 6 hotspot prediction algorithms with Δlog10(koff).(A) PCCs for mutants on Complexes with interface-area >1600 Å2 (LIA). (B) PCCs for mutants on complexes with interface-area <1600 Å2 (SIA). (C) PCCs for mutants on complexes with size <500 residues (SCS). (D) PCCs for mutants on complexes with size >500 residues (LCS). (E) LIA-SCS, (F) LIA-LCS, (G) SIA-SCS, (H) SIA-LCS. (I) Scatter plot of complex size vs. interface area for all complexes in off-rate mutant dataset. Here it is observed that complex stability is distributed across all three regions for small-size complexes (C, E and G), whereas the core becomes a localized region of stability for large-complex sizes (D, F, H). On analysis of the interface-area vs. complex-size subsets (E–H), the distribution of stability regions is affected primarily through complex-size irrespective of interface-area.
Figure 10
Figure 10. Effects of cooperativity on effective energetic contribution of hotregions.
The summation of single-point alanine ΔΔGs of a hotregion may underestimate/overestimate its contribution if negative/positive cooperative effects are at play respectively. In this work, in order to account for potential cooperative effects, hotspot descriptors HSEner_PosCoop, HSEner_NegCoop apply linearly decreasing and increasing weights respectively to single-point alanine ΔΔGs within a hotregion. In turn Int_HS_Energy, based on the assumption the hotspot residues within the hotregion can be assumed to be additive, does not apply any weights. Here, the effects of accounting for cooperative/additive effects on the predicted hotspot and hotregions energies on all mutated complexes used in this work, is shown. (A) The mean hotspot energies for hotregion sizes of 1 to 8 hotspot residues. Each column shows the predictions of different hotspot predictors. (A) First row (blue), shows the raw mean hotspot energies, which essentially assumes all hotspots are additive within a hotregion. (A) Second row (red), assumes negative cooperativity within hotregions. To account for negative cooperativity, a linearly increasing weight is applied to the hotspot energies according to the size of the hotregion they are in (see Materials and Methods). (A) Third row (green), assumes positive cooperativity within hotregions and a linearly decreasing weight is applied to the hotspot energies according to the size of hotregion. (B) is similar to (A) but values are now the mean of the total hotregion energy of the given size. Effectively, the additive hotspot energy assumption results in hotregions contributing in a linearly increasing manner according to their size, the negative cooperativity assumption results in hotregions contributing in an increasing exponential-like manner as the hotregions increase in size, and the positive cooperativity assumption results in hotregions reaching a maximum contribution at around a hotregion size of 5, with their contribution decreasing beyond.
Figure 11
Figure 11. Effects of conformational changes and off-rate prediction.
Predictions of the original 13 regression models developed for off-rate prediction. The predictions are assessed separately (PCC with Δlog10(koff)) for mutations on complexes which undergo significant backbone conformational changes of I_RMSD >1.5 Å (dark green), notable conformational changes of I_RMSD >1 Å (light green) and little to no conformational changes I_RMSD <1 Å (dark blue). Predicted accuracy is directly related to the magnitude of conformational change and becomes highly dependent on the model at higher levels of conformational changes. I_RMSD values were extracted from our previous work on the construction of a protein-protein affinity database .

Similar articles

Cited by

References

    1. Cheng TM, Goehring L, Jeffery L, Lu YE, Hayles J, et al. (2012) A structural systems biology approach for quantifying the systemic consequences of missense mutations in proteins. PLoS Comput Biol 8: e1002738. - PMC - PubMed
    1. Kiel C, Serrano L (2009) Cell type-specific importance of ras-c-raf complex association rate constants for MAPK signaling. Sci Signal 2: ra38. - PubMed
    1. Cloutier M, Wang E (2011) Dynamic modeling and analysis of cancer cellular network motifs. Integr Biol (Camb) 3: 724–732. - PubMed
    1. Schmierer B, Tournier AL, Bates PA, Hill CS (2008) Mathematical modeling identifies Smad nucleocytoplasmic shuttling as a dynamic signal-interpreting system. Proc Natl Acad Sci U S A 105: 6608–6613. - PMC - PubMed
    1. Cheng TM, Gulati S, Agius R, Bates PA (2012) Understanding cancer mechanisms through network dynamics. Brief Funct Genomics 11: 543–560. - PubMed

Publication types