Background: pKa values are a measure of the protonation of ionizable groups in proteins. Ionizable groups are involved in intra-protein, protein-solvent and protein-ligand interactions as well as solubility, protein folding and catalytic activity. The pKa shift of a group from its intrinsic value is determined by the perturbation of the residue by the environment and can be calculated from three-dimensional structural data.
Results: Here we use a large dataset of experimentally-determined pKas to analyse the performance of different prediction techniques. Our work provides a benchmark of available software implementations: MCCE, MEAD, PROPKA and UHBD. Combinatorial and regression analysis is also used in an attempt to find a consensus approach towards pKa prediction. The tendency of individual programs to over- or underpredict the pKa value is related to the underlying methodology of the individual programs.
Conclusion: Overall, PROPKA is more accurate than the other three programs. Key to developing accurate predictive software will be a complete sampling of conformations accessible to protein structures.