Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Apr 24;4(6):100738.
doi: 10.1016/j.patter.2023.100738. eCollection 2023 Jun 9.

Network embedding unveils the hidden interactions in the mammalian virome

Affiliations

Network embedding unveils the hidden interactions in the mammalian virome

Timothée Poisot et al. Patterns (N Y). .

Abstract

Predicting host-virus interactions is fundamentally a network science problem. We develop a method for bipartite network prediction that combines a recommender system (linear filtering) with an imputation algorithm based on low-rank graph embedding. We test this method by applying it to a global database of mammal-virus interactions and thus show that it makes biologically plausible predictions that are robust to data biases. We find that the mammalian virome is under-characterized anywhere in the world. We suggest that future virus discovery efforts could prioritize the Amazon Basin (for its unique coevolutionary assemblages) and sub-Saharan Africa (for its poorly characterized zoonotic reservoirs). Graph embedding of the imputed network improves predictions of human infection from viral genome features, providing a shortlist of priorities for laboratory studies and surveillance. Overall, our study indicates that the global structure of the mammal-virus network contains a large amount of information that is recoverable, and this provides new insights into fundamental biology and disease emergence.

Keywords: imputation; singular value decomposition; virome; zoonotic viruses.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

None
Graphical abstract
Figure 1
Figure 1
Mammal biodiversity and sampling bias shape the geography of predicted interactions (A) The total number of interactions recorded does not track the global distribution of mammalian richness with an overwhelming density of interactions in Europe. (B) Known zoonotic hosts are concentrated in the Amazon, an area with comparatively fewer known host-virus interactions; the distribution of known zoonotic hosts closely tracks the global richness of mammals. (C and D) Post imputation, the model predicts strong increases in the number of interactions (C) in the Amazon and Central Europe but an increase in the number of zoonotic hosts primarily concentrated in Africa (D). As a result, we expect the Amazon to be a hotspot of novel interactions and Africa to be a hotspot of novel zoonotic hosts (i.e., the increase is greater than expected, given the known quantities in these places).
Figure 2
Figure 2
The global virome pre and post imputation Network layouts reflect the first two dimensions of a t-SNE embedding on four dimensions, where the positions of nodes were initially picked based on a PCA analysis. Hosts are shown as circles and viruses as downward-pointing triangles, and the relative size of each point scales linearly with degree (using the same scale for both figures; i.e., two nodes with the same degree will have the same size on the left and right).
Figure 3
Figure 3
Network imputation reveals a hotspot of unique host-virus associations in the Amazon (A and B) The compositional uniqueness of host-virus interactions remains about similarly distributed in the pre-imputation (A) and post-imputation (B) networks. (C) Nevertheless, the largest hotspot in gain of interaction uniqueness is in the Amazon. (D) It appears that the predicted hotspots of uniqueness gain closely follow the originality of the host compositions, suggesting that more unique mammal assemblages have more original host-virus networks. Hotspots are given as the difference in uniqueness post and pre imputation, both rescaled between 0 and 1.
Figure 4
Figure 4
Predictive performance of LF-SVD generally increases with increased connectivity Points represent individual host species and show the probability that a randomly sampled virus known to infect that host will be ranked above a randomly sampled virus that has not been observed to do so (measured as the area under the receiver operating characteristic curve [ROC-AUC]). While hosts subject to extreme study bias, such as humans, cannot be predicted, this does not appear to degrade performance on other species.
Figure 5
Figure 5
Network embeddings improved the ability to identify viruses that can infect humans (A) An existing model of human infection risk using virus genomic features is improved when network embeddings are added as virus traits; models that use embeddings from the imputed network perform better than those using the observed network. Violin and boxplots show the ROC-AUC for test set predictions across 1,000 replicate 70%:15%:15% train:calibrate:test splits (n=612). p-values from pairwise Kruskal-Wallis rank-sum tests are shown for all comparisons. Diamonds indicate the performance of a bagged model that averages predictions from the 100 best-performing models based on test set AUC iteratively re-calculated while excluding the virus being predicted. Mean AUC: genome composition model = 0.723; genome composition + observed network = 0.830; genome composition + imputed network = 0.875. (B) Predictive feature importance in the combined (genome composition + imputed network) model; network embeddings are consistently the top predictive features compared with biologically informative measures of genome composition.
Figure 6
Figure 6
Ranking viruses by their predicted probability of human infection accurately predicts known infections Viruses are arranged by the mean prediction produced by a bagged version of the model trained on genome composition features and an embedding representing the imputed network (panel A; black line). Error bars show the region containing 95% of the predictions used for bagging. Dashed lines highlight the cutoff that maximizes informedness (Youden’s J) when converting mean predicted probabilities to binary predictions. Panel B shows the most reliable detection method providing evidence of human infection for each virus in the CLOVER database. For the purposes of model training, viruses linked to humans through serological detection only or where the detection method was unspecified were labeled negative; the model nevertheless identifies the majority of these as human infecting.

Similar articles

Cited by

References

    1. Albery G.F., Becker D.J., Brierley L., Brook C.E., Christofferson R.C., Cohen L.E., Dallas T.A., Eskew E.A., Fagre A., Farrell M.J., et al. The science of the host–virus network. Nat. Microbiol. 2021;6:1483–1492. - PubMed
    1. Dallas T., Park A.W., Drake J.M. Predicting cryptic links in host-parasite networks. PLoS Comput. Biol. 2017;13:e1005557. - PMC - PubMed
    1. Carlson C.J., Zipfel C.M., Garnier R., Bansal S. Global estimates of mammalian viral diversity accounting for host sharing. Nat. Ecol. Evol. 2019;3:1070–1075. - PubMed
    1. Carlson C.J., Albery G.F., Merow C., Trisos C.H., Zipfel C.M., Eskew E.A., Olival K.J., Ross N., Bansal S. Climate change increases cross-species viral transmission risk. Nature. 2022;607:555–562. - PubMed
    1. Nejati M., Samavi S., Derksen H., Najarian K. Denoising by low-rank and sparse representations. J. Vis. Commun. Image Represent. 2016;36:28–39. doi: 10.1016/j.jvcir.2016.01.004. - DOI

LinkOut - more resources