A clustering-based biased Monte Carlo approach to protein titration curve prediction

Proc Int Conf Mach Learn Appl. 2020 Dec;2020:10.1109/icmla51294.2020.00037. doi: 10.1109/icmla51294.2020.00037. Epub 2021 Feb 23.


In this work, we developed an efficient approach to compute ensemble averages in systems with pairwise-additive energetic interactions between the entities. Methods involving full enumeration of the configuration space result in exponential complexity. Sampling methods such as Markov Chain Monte Carlo (MCMC) algorithms have been proposed to tackle the exponential complexity of these problems; however, in certain scenarios where significant energetic coupling exists between the entities, the efficiency of the such algorithms can be diminished. We used a strategy to improve the efficiency of MCMC by taking advantage of the cluster structure in the interaction energy matrix to bias the sampling. We pursued two different schemes for the biased MCMC runs and show that they are valid MCMC schemes. We used both synthesized and real-world systems to show the improved performance of our biased MCMC methods when compared to the regular MCMC method. In particular, we applied these algorithms to the problem of estimating protonation ensemble averages and titration curves of residues in a protein.

Keywords: Discrete Optimization; Energy Minimization; Ensemble Averages; Markov Chain Monte Carlo (MCMC); Protein Titration.