BERT-Kcr: Prediction of lysine crotonylation sites by a transfer learning method with pre-trained BERT models

Bioinformatics. 2021 Oct 13;btab712. doi: 10.1093/bioinformatics/btab712. Online ahead of print.

Abstract

Motivation: As one of the most important post-translational modifications (PTMs), protein lysine crotonylation (Kcr) has attracted wide attention, which involves in important physiological activities, such as cell differentiation and metabolism. However, experimental methods are expensive and time-consuming for Kcr identification. Instead, computational methods can predict Kcr sites in silico with high efficiency and low cost.

Results: In this study, we proposed a novel predictor, BERT-Kcr, for protein Kcr sites prediction, which was developed by using a transfer learning method with pre-trained bidirectional encoder representations from transformers (BERT) models. These models were originally used for natural language processing (NLP) tasks, such as sentence classification. Here, we transferred each amino acid into a word as the input information to the pre-trained BERT model. The features encoded by BERT were extracted and then fed to a BiLSTM network to build our final model. Compared with the models built by other machine learning and deep learning classifiers, BERT-Kcr achieved the best performance with AUROC of 0.983 for 10-fold cross-validation. Further evaluation on the independent test set indicates that BERT-Kcr outperforms the state-of-the-art model Deep-Kcr with an improvement of about 5% for AUROC. The results of our experiment indicate that the direct use of sequence information and advanced pre-trained models of natural language processing could be an effective way for identifying post-translational modification sites of proteins.

Availability: The BERT-Kcr model is publicly available on http://zhulab.org.cn/BERT-Kcr_models/.

Supplementary information: Supplementary data are available at Bioinformatics online.