Small molecule distribution coefficients between immiscible nonaqueuous and aqueous phases-such as cyclohexane and water-measure the degree to which small molecules prefer one phase over another at a given pH. As distribution coefficients capture both thermodynamic effects (the free energy of transfer between phases) and chemical effects (protonation state and tautomer effects in aqueous solution), they provide an exacting test of the thermodynamic and chemical accuracy of physical models without the long correlation times inherent to the prediction of more complex properties of relevance to drug discovery, such as protein-ligand binding affinities. For the SAMPL5 challenge, we carried out a blind prediction exercise in which participants were tasked with the prediction of distribution coefficients to assess its potential as a new route for the evaluation and systematic improvement of predictive physical models. These measurements are typically performed for octanol-water, but we opted to utilize cyclohexane for the nonpolar phase. Cyclohexane was suggested to avoid issues with the high water content and persistent heterogeneous structure of water-saturated octanol phases, since it has greatly reduced water content and a homogeneous liquid structure. Using a modified shake-flask LC-MS/MS protocol, we collected cyclohexane/water distribution coefficients for a set of 53 druglike compounds at pH 7.4. These measurements were used as the basis for the SAMPL5 Distribution Coefficient Challenge, where 18 research groups predicted these measurements before the experimental values reported here were released. In this work, we describe the experimental protocol we utilized for measurement of cyclohexane-water distribution coefficients, report the measured data, propose a new bootstrap-based data analysis procedure to incorporate multiple sources of experimental error, and provide insights to help guide future iterations of this valuable exercise in predictive modeling.
Keywords: Blind challenge; Distribution coefficients; Partition coefficients; Predictive modeling; SAMPL.