DeepBindPoc: a deep learning method to rank ligand binding pockets using molecular vector representation

PeerJ. 2020 Apr 6;8:e8864. doi: 10.7717/peerj.8864. eCollection 2020.


Accurate identification of ligand-binding pockets in a protein is important for structure-based drug design. In recent years, several deep learning models were developed to learn important physical-chemical and spatial information to predict ligand-binding pockets in a protein. However, ranking the native ligand binding pockets from a pool of predicted pockets is still a hard task for computational molecular biologists using a single web-based tool. Hence, we believe, by using closer to real application data set as training and by providing ligand information, an enhanced model to identify accurate pockets can be obtained. In this article, we propose a new deep learning method called DeepBindPoc for identifying and ranking ligand-binding pockets in proteins. The model is built by using information about the binding pocket and associated ligand. We take advantage of the mol2vec tool to represent both the given ligand and pocket as vectors to construct a densely fully connected layer model. During the training, important features for pocket-ligand binding are automatically extracted and high-level information is preserved appropriately. DeepBindPoc demonstrated a strong complementary advantage for the detection of native-like pockets when combined with traditional popular methods, such as fpocket and P2Rank. The proposed method is extensively tested and validated with standard procedures on multiple datasets, including a dataset with G-protein Coupled receptors. The systematic testing and validation of our method suggest that DeepBindPoc is a valuable tool to rank near-native pockets for theoretically modeled protein with unknown experimental active site but have known ligand. The DeepBindPoc model described in this article is available at GitHub ( and the webserver is available at (

Keywords: Deep neural network; Densely fully connected neural network; Ligand pocket identification; Mol2vec; Protein–ligand interactions.

Grant support

This work was supported by the National Key Research and Development Program of China under grant Nos. 2018YFB0204403 and 2016YFB0201305, the Shenzhen Basic Research Fund under grant no. JCYJ20180507182818013, GGFW2017073114031767 and JCYJ20170413093358429, National Science Foundation of China under grant nos. U1435215 and 61433012; the National Natural Youth Science Foundation of China (grant no. 31601028), the China Postdoctoral Science Foundation (grant no. 2019M653132), CAS Key Lab under grant no. 2011DP173015. This work was also supported by the Shenzhen Discipline Construction Project for Urban Computing and Data Intelligence, Youth Innovation Promotion Association, CAS to Yanjie Wei. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.