Calculation of molecular lipophilicity: State-of-the-art and comparison of log P methods on more than 96,000 compounds
- PMID: 18683876
- DOI: 10.1002/jps.21494
Calculation of molecular lipophilicity: State-of-the-art and comparison of log P methods on more than 96,000 compounds
Abstract
We first review the state-of-the-art in development of log P prediction approaches falling in two major categories: substructure-based and property-based methods. Then, we compare the predictive power of representative methods for one public (N = 266) and two in house datasets from Nycomed (N = 882) and Pfizer (N = 95809). A total of 30 and 18 methods were tested for public and industrial datasets, respectively. Accuracy of models declined with the number of nonhydrogen atoms. The Arithmetic Average Model (AAM), which predicts the same value (the arithmetic mean) for all compounds, was used as a baseline model for comparison. Methods with Root Mean Squared Error (RMSE) greater than RMSE produced by the AAM were considered as unacceptable. The majority of analyzed methods produced reasonable results for the public dataset but only seven methods were successful on the both in house datasets. We proposed a simple equation based on the number of carbon atoms, NC, and the number of hetero atoms, NHET: log P = 1.46(+/-0.02) + 0.11(+/-0.001) NC-0.11(+/-0.001) NHET. This equation outperformed a large number of programs benchmarked in this study. Factors influencing the accuracy of log P predictions were elucidated and discussed.
(c) 2008 Wiley-Liss, Inc. and the American Pharmacists Association
Similar articles
-
Large-scale evaluation of log P predictors: local corrections may compensate insufficient accuracy and need of experimentally testing every other compound.Chem Biodivers. 2009 Nov;6(11):1837-44. doi: 10.1002/cbdv.200900075. Chem Biodivers. 2009. PMID: 19937825
-
Substructure and whole molecule approaches for calculating log P.J Comput Aided Mol Des. 2001 Apr;15(4):337-54. doi: 10.1023/a:1011107422318. J Comput Aided Mol Des. 2001. PMID: 11349816
-
A comparison of methods to handle skew distributed cost variables in the analysis of the resource consumption in schizophrenia treatment.J Ment Health Policy Econ. 2002 Mar;5(1):21-31. J Ment Health Policy Econ. 2002. PMID: 12529567
-
Comparison of Multiple Linear Regressions and Neural Networks based QSAR models for the design of new antitubercular compounds.Eur J Med Chem. 2013;70:831-45. doi: 10.1016/j.ejmech.2013.10.029. Epub 2013 Oct 23. Eur J Med Chem. 2013. PMID: 24246731
-
Recent progress in QSAR-technology.Drug Des Discov. 1993;9(3-4):277-85. Drug Des Discov. 1993. PMID: 8400008 Review.
Cited by
-
Pyrimidine Schiff Bases: Synthesis, Structural Characterization and Recent Studies on Biological Activities.Int J Mol Sci. 2024 Feb 8;25(4):2076. doi: 10.3390/ijms25042076. Int J Mol Sci. 2024. PMID: 38396753 Free PMC article.
-
Deep learning algorithms applied to computational chemistry.Mol Divers. 2023 Dec 27. doi: 10.1007/s11030-023-10771-y. Online ahead of print. Mol Divers. 2023. PMID: 38151697 Review.
-
Novel NIR-II fluorescent probes for biliary atresia imaging.Acta Pharm Sin B. 2023 Nov;13(11):4578-4590. doi: 10.1016/j.apsb.2023.07.005. Epub 2023 Jul 7. Acta Pharm Sin B. 2023. PMID: 37969732 Free PMC article.
-
Hierarchical Virtual Screening of Potential New Antibiotics from Polyoxygenated Dibenzofurans against Staphylococcus aureus Strains.Pharmaceuticals (Basel). 2023 Oct 9;16(10):1430. doi: 10.3390/ph16101430. Pharmaceuticals (Basel). 2023. PMID: 37895901 Free PMC article.
-
A benchmark dataset for machine learning in ecotoxicology.Sci Data. 2023 Oct 18;10(1):718. doi: 10.1038/s41597-023-02612-2. Sci Data. 2023. PMID: 37853023 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
