Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Oct;32(10):1203-1216.
doi: 10.1007/s10822-018-0138-6. Epub 2018 Aug 6.

SAMPL6: calculation of macroscopic pKa values from ab initio quantum mechanical free energies

Affiliations
Free PMC article

SAMPL6: calculation of macroscopic pKa values from ab initio quantum mechanical free energies

Edithe Selwa et al. J Comput Aided Mol Des. 2018 Oct.
Free PMC article

Abstract

Macroscopic pKa values were calculated for all compounds in the SAMPL6 blind prediction challenge, based on quantum chemical calculations with a continuum solvation model and a linear correction derived from a small training set. Microscopic pKa values were derived from the gas-phase free energy difference between protonated and deprotonated forms together with the Conductor-like Polarizable Continuum Solvation Model and the experimental solvation free energy of the proton. pH-dependent microstate free energies were obtained from the microscopic pKas with a maximum likelihood estimator and appropriately summed to yield macroscopic pKa values or microstate populations as function of pH. We assessed the accuracy of three approaches to calculate the microscopic pKas: direct use of the quantum mechanical free energy differences and correction of the direct values for short-comings in the QM solvation model with two different linear models that we independently derived from a small training set of 38 compounds with known pKa. The predictions that were corrected with the linear models had much better accuracy [root-mean-square error (RMSE) 2.04 and 1.95 pKa units] than the direct calculation (RMSE 3.74). Statistical measures indicate that some systematic errors remain, likely due to differences in the SAMPL6 data set and the small training set with respect to their interactions with water. Overall, the current approach provides a viable physics-based route to estimate macroscopic pKa values for novel compounds with reasonable accuracy.

Keywords: Quantum chemistry; SAMPL challenge; pH; pK a.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Chemical structures of the SAMPL6 data set. SM20 is the only compound that contains a single titratable proton; all other compounds contain multiple titratable protons and, in some cases, tautomers.
Fig. 2
Fig. 2
Chemical structures of the QM1 training data set (neutral acids); see also Table 1.
Fig. 3
Fig. 3
Chemical structures of the QM2 training data set (cationic acids); see also Table 1.
Fig. 4
Fig. 4
Training data set. The pKa of the training data set compounds are used to derive a simple linear model that relates the free energy correction ΔGcorr* to the experimental pKa. Two linear models were derived: a global linear model (black dashed line), utilizing all data, and a piecewise linear model that applies to either neutral acids (subset QM1, blue) or to positively charged acids (subset QM2, green). a: Correlation between experimental and calculated pKa of the training data set. The dashed line indicates ideal correlation with the gray band indicating 1 pKa unit deviation. b: Global linear fit of the calculated ΔGcorr* to the experimental pKa. c: Linear fits of the calculated ΔGcorr* to the experimental pKa, split between the QM1 and the QM2 subsets. In (b) and (c) the dashed lines are linear models to the data, with shaded bands indicating 95% confidence intervals from 1000 bootstrap samples.
Fig. 5
Fig. 5
Signed error Δid of individual predictions. The calculated pKa was matched to the experimental pKa for each compound (indicated by the SAMPL6 pKa ID) and the deviation from the experimental value represented as a bar. Observations for the same compound have the same color. a: pKa were directly estimated from the quantum mechanical free energy differences. b: The quantum mechanical pKa were corrected with the global linear model. c: compounds were corrected depending on their membership in subsets 1 or 2 with the piecewise linear model.
Fig. 6
Fig. 6
Correlation between experimental and calculated pKa values for the SAMPL6 compounds. a: pKa were directly estimated from the quantum mechanical free energy differences. b: The quantum mechanical pKa were corrected with the global linear model. c: compounds were corrected depending on their membership in subsets 1 or 2. The black dashed line indicates ideal correlation, the shaded green bars show 0.5 and 1 pKa units deviation from ideal. Blue lines are linear regression fits to the data, with the blue shaded area indicating the 95% confidence interval from 1000 bootstrap samples.
Fig. 7
Fig. 7
Comparison of chemical properties of the training (light blue) and SAMPL6 (orange) data sets. a: normalized histograms of the number of rotatable bonds; b: normalized histograms of the number of hydrogen bond acceptors; c: correlation between the number of heavy atoms and the number of acceptors with linear regressions shown as solid lines and their 95% confidence interval from 1000 bootstraps indicated by shaded areas.
Fig. 8
Fig. 8
Microstate probabilities pi for SM14. a: Computed microstate probabilities (for the piecewise linear fit) are shown as heavy solid lines and experimentally derived probabilities as thin dashed lines. The experimental pi were calculated in the same way as the calculated ones (Eq. 19) by directly using the experimental microstate pKas. b: Microstate diagram with arrows indicating deprotonation. Bold numbers near solid arrows are the calculated microstate pKa (from (a)) and italic numbers near dashed arrows are the experimental numbers, assigned to the experimentally identified microstate transitions. The gray solid arrows with gray bold numbers indicate the calculated macroscopic pKa from N = 3 protons (microstate SM14_micro003) to N = 2 protons (mixture of SM14_micro002 and SM14_micro004, indicated by the orange box) to N = 1 proton in SM14_micro001 (and SM14_micro005, which is not shown because computation and experiment indicate that it is suppressed relative to SM14_micro001).
Fig. 9
Fig. 9
RMSE of all SAMPL6 submissions (blue), including our new calculations for all SAMPL6 compounds (red) and for completeness our original submissions (gray), which only included predictions for SM15, SM20, and SM22 and is only of limited statistical validity because of the large variance of the RMSE itself for only three samples [37]. The submission IDs p0jba and xxxc correspond to the piecewise linear model, 35bdm and xxxb to the global linear model, and xxxa to directly using the quantum chemical free energies. Other IDs belong to other regular SAMPL6 submissions. The error bars indicate 95% confidence intervals from 1000 bootstrap samples.

Similar articles

Cited by

References

    1. Nicholls A, Mobley DL, Guthrie JP, Chodera JD, Bayly CI, Cooper MD, Pande VS (2008) Predicting small-molecule solvation free energies: An infor¬mal blind test for computational chemistry. J Med Chem 51(4):769–779, DOI 10.1021/jm070549+ - DOI - PubMed
    1. Guthrie JP (2009) A blind challenge for computational solvation free ener¬gies: Introduction and overview. J Phys Chem B 113(14):4501–4507, DOI 10.1021/jp806724u - DOI - PubMed
    1. Geballe MT, Skillman AG, Nicholls A, Guthrie JP, Taylor PJ (2010) The SAMPL2 blind prediction challenge: Introduction and overview. J Comput Aided Mol Des 24(4):259–279, DOI 10.1007/s10822-010-9350-8 - DOI - PubMed
    1. Geballe MT, Guthrie JP (2012) The SAMPL3 blind prediction challenge: transfer energy overview. J Comput Aided Mol Des 26(5):489–96, DOI 10.1007/s10822-012-9568-8 - DOI - PubMed
    1. Mobley DL, Wymer KL, Lim NM, Guthrie JP (2014) Blind prediction of solvation free energies from the SAMPL4 challenge. J Comput Aided Mol Des 28(3):135–50, DOI 10.1007/s10822-014-9718-2 - DOI - PMC - PubMed

Publication types

LinkOut - more resources