VHI-Pred: A Multi-Feature-Based Tool for Predicting Human-Virus Protein-Protein Interactions

Mol Biotechnol. 2025 Apr 5. doi: 10.1007/s12033-025-01417-5. Online ahead of print.

Abstract

Viral diseases pose a significant threat to public health, highlighting the importance of understanding protein-protein interactions between hosts and viruses for therapeutic development. However, this process is often expensive and time-consuming, especially given the rapid evolution of viruses. Machine learning algorithms and artificial intelligence have emerged as powerful tools for efficiently identifying these interactions. This study aims to develop a machine learning-based model to predict protein interactions between viral pathogens and human hosts while analyzing the factors influencing these interactions. The prediction model was constructed using three machine learning algorithms: Random Forest (RF), XGBoost (XGB), and Artificial Neural Networks (ANN). Each algorithm underwent rigorous testing. The modeling features included physicochemical properties, motifs, and amino acid sequences. Model performance was evaluated using fitness, accuracy, precision, sensitivity, and specificity metrics, with validation conducted via the K-fold method. The accuracy of the RF, XGB, and ANN models was 87%, 86%, and 86%, respectively. By integrating dimensionality reduction and clustering techniques, the accuracy of the RF model improved to 90%. Traditionally, studying host-pathogen interactions is labor intensive and costly. The integration of machine learning algorithms into this field significantly enhances the efficiency of analyzing viral pathogen-human host interactions. This study demonstrates the effectiveness of such an approach and provides valuable insights for future research. The results are accessible to researchers through a web application at http://vhi.sysbiomed.ir .

Keywords: Artificial intelligence; Human host; Machine learning; Prediction tool; Viral pathogen.