Prediction of aqueous solubility of organic compounds by the general solubility equation (GSE)

J Chem Inf Comput Sci. 2001 Sep-Oct;41(5):1208-17. doi: 10.1021/ci010287z.


The revised general solubility equation (GSE) is used along with four different methods including Huuskonen's artificial neural network (ANN) and three multiple linear regression (MLR) methods to estimate the aqueous solubility of a test set of the 21 pharmaceutically and environmentally interesting compounds. For the selected test sets, it is clear that the GSE and ANN predictions are more accurate than MLR methods. The GSE has the advantages of being simple and thermodynamically sound. The only two inputs used in the GSE are the Celsius melting point (MP) and the octanol water partition coefficient (K(ow)). No fitted parameters and no training data are used in the GSE, whereas other methods utilize a large number of parameters and require a training set. The GSE is also applied to a test set of 413 organic nonelectrolytes that were studied by Huuskonen. Although the GSE uses only two parameters and no training set, its average absolute errors is only 0.1 log units larger than that of the ANN, which requires many parameters and a large training set. The average absolute error AAE is 0.54 log units using the GSE and 0.43 log units using Huuskonen's ANN modeling. This study provides evidence for the GSE being a convenient and reliable method to predict aqueous solubilities of organic compounds.