Decision tree based ensemble machine learning model for the prediction of Zika virus T-cell epitopes as potential vaccine candidates

Sci Rep. 2022 May 12;12(1):7810. doi: 10.1038/s41598-022-11731-6.


Zika fever is an infectious disease caused by the Zika virus (ZIKV). The disease is claiming millions of lives worldwide, primarily in developing countries. In addition to vector control strategies, the most effective way to prevent the spread of ZIKV infection is vaccination. There is no clinically approved vaccine to combat ZIKV infection and curb its pandemic. An epitope-based peptide vaccine (EBPV) is seen as a powerful alternative to conventional vaccinations because of its low production cost and short production time. Nonetheless, EBPVs have gotten less attention, despite the fact that they have a significant untapped potential for enhancing vaccine safety, immunogenicity, and cross-reactivity. Such a vaccine technology is based on target pathogen's selected antigenic peptides called T-cell epitopes (TCE), which are synthesized chemically based on their amino acid sequences. The identification of TCEs using wet-lab experimental approach is challenging, expensive, and time-consuming. Therefore in this study, we present computational model for the prediction of ZIKV TCEs. The model proposed is an ensemble of decision trees that utilizes the physicochemical properties of amino acids. In this way a large amount of time and efforts would be saved for quick vaccine development. The peptide sequences dataset for model training was retrieved from Virus Pathogen Database and Analysis Resource (ViPR) database. The sequences dataset consist of experimentally verified T-cell epitopes (TCEs) and non-TCEs. The model demonstrated promising results when evaluated on test dataset. The evaluation metrics namely, accuracy, AUC, sensitivity, specificity, Gini and Mathew's correlation coefficient (MCC) recorded values of 0.9789, 0.984, 0.981, 0.987, 0.974 and 0.948 respectively. The consistency and reliability of the model was assessed by carrying out the five (05)-fold cross-validation technique, and the mean accuracy of 0.97864 was reported. Finally, model was compared with standard machine learning (ML) algorithms and the proposed model outperformed all of them. The proposed model will aid in predicting novel and immunodominant TCEs of ZIKV. The predicted TCEs may have a high possibility of acting as prospective vaccine targets subjected to in-vivo and in-vitro scientific assessments, thereby saving lives worldwide, preventing future epidemic-scale outbreaks, and lowering the possibility of mutation escape.