Background: Phosphorylation of histidine residues plays crucial roles in signaling pathways and cell metabolism in prokaryotes such as bacteria. While evidence has emerged that protein histidine phosphorylation also occurs in more complex organisms, its role in mammalian cells has remained largely uncharted. Thus, it is highly desirable to develop computational tools that are able to identify histidine phosphorylation sites. Result: Here, we introduce PROSPECT that enables fast and accurate prediction of proteome-wide histidine phosphorylation substrates and sites. Our tool is based on a hybrid method that integrates the outputs of two convolutional neural network (CNN)-based classifiers and a random forest-based classifier. Three features, including the one-of-K coding, enhanced grouped amino acids content (EGAAC) and composition of k-spaced amino acid group pairs (CKSAAGP) encoding, were taken as the input to three classifiers, respectively. Our results show that it is able to accurately predict histidine phosphorylation sites from sequence information. Our PROSPECT web server is user-friendly and publicly available at http://PROSPECT.erc.monash.edu/. Conclusions: PROSPECT is superior than other pHis predictors in both the running speed and prediction accuracy and we anticipate that the PROSPECT webserver will become a popular tool for identifying the pHis sites in bacteria.
Keywords: Protein phosphorylation; bioinformatics; deep learning; histine phosphorylation; pattern recognition; sequence analysis.