Predicting virus-host association by Kernelized logistic matrix factorization and similarity network fusion

BMC Bioinformatics. 2019 Dec 2;20(Suppl 16):594. doi: 10.1186/s12859-019-3082-0.


Background: Viruses are closely related to bacteria and human diseases. It is of great significance to predict associations between viruses and hosts for understanding the dynamics and complex functional networks in microbial community. With the rapid development of the metagenomics sequencing, some methods based on sequence similarity and genomic homology have been used to predict associations between viruses and hosts. However, the known virus-host association network was ignored in these methods.

Results: We proposed a kernelized logistic matrix factorization with integrating different information to predict potential virus-host associations on the heterogeneous network (ILMF-VH) which is constructed by connecting a virus network with a host network based on known virus-host associations. The virus network is constructed based on oligonucleotide frequency measurement, and the host network is constructed by integrating oligonucleotide frequency similarity and Gaussian interaction profile kernel similarity through similarity network fusion. The host prediction accuracy of our method is better than other methods. In addition, case studies show that the host of crAssphage predicted by ILMF-VH is consistent with presumed host in previous studies, and another potential host Escherichia coli is also predicted.

Conclusions: The proposed model is an effective computational tool for predicting interactions between viruses and hosts effectively, and it has great potential for discovering novel hosts of viruses.

Keywords: Gaussian interaction profile; Logistic matrix factorization; Oligonucleotide frequency; Similarity network fusion; Virus-host association.

MeSH terms

  • Algorithms*
  • Area Under Curve
  • Databases as Topic
  • Host-Pathogen Interactions
  • Humans
  • Logistic Models
  • Viruses / genetics*