Multi-instance learning of graph neural networks for aqueous pKa prediction

Jiacheng Xiong; Zhaojun Li; Guangchao Wang; Zunyun Fu; Feisheng Zhong; Tingyang Xu; Xiaomeng Liu; Ziming Huang; Xiaohong Liu; Kaixian Chen; Hualiang Jiang; Mingyue Zheng

doi:10.1093/bioinformatics/btab714

Multi-instance learning of graph neural networks for aqueous pKa prediction

Bioinformatics. 2022 Jan 12;38(3):792-798. doi: 10.1093/bioinformatics/btab714.

Authors

Jiacheng Xiong^{1

2}, Zhaojun Li³, Guangchao Wang⁴, Zunyun Fu¹, Feisheng Zhong^{1

2}, Tingyang Xu⁵, Xiaomeng Liu^{1

2}, Ziming Huang^{1

2}, Xiaohong Liu^{1

3

6}, Kaixian Chen^{1

2}, Hualiang Jiang^{1

2

6}, Mingyue Zheng^{1

2}

Affiliations

¹ Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China.
² College of Pharmacy, University of Chinese Academy of Sciences, Beijing 100049, China.
³ Development Department, Suzhou Alphama Biotechnology Co., Ltd, Suzhou City 215000, China.
⁴ College of Computer and Information Engineering, Dezhou University, Dezhou City 253023, China.
⁵ Tencent AI Lab, Tencent, Shenzhen 518057, China.
⁶ Shanghai Institute for Advanced Immunochemical Studies, and School of Life Science and Technology, ShanghaiTech University, Shanghai 200031, China.

Abstract

Motivation: The acid dissociation constant (pKa) is a critical parameter to reflect the ionization ability of chemical compounds and is widely applied in a variety of industries. However, the experimental determination of pKa is intricate and time-consuming, especially for the exact determination of micro-pKa information at the atomic level. Hence, a fast and accurate prediction of pKa values of chemical compounds is of broad interest.

Results: Here, we compiled a large-scale pKa dataset containing 16 595 compounds with 17 489 pKa values. Based on this dataset, a novel pKa prediction model, named Graph-pKa, was established using graph neural networks. Graph-pKa performed well on the prediction of macro-pKa values, with a mean absolute error around 0.55 and a coefficient of determination around 0.92 on the test dataset. Furthermore, combining multi-instance learning, Graph-pKa was also able to automatically deconvolute the predicted macro-pKa into discrete micro-pKa values.

Availability and implementation: The Graph-pKa model is now freely accessible via a web-based interface (https://pka.simm.ac.cn/).

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Neural Networks, Computer*
Water* / chemistry

Substances

Water

Abstract

Publication types

MeSH terms

Substances

Grants and funding