Statistical characteristics of amino acid covariance as possible descriptors of viral genomic complexity

Sci Rep. 2019 Dec 5;9(1):18410. doi: 10.1038/s41598-019-54720-y.

Abstract

At the sequence level it is hard to describe the complexity of viruses which allows them to challenge host immune system, some for a few weeks and others up to a complete compromise. Paradoxically, viral genomes are both complex and simple. Complex because amino acid mutation rates are very high, and yet viruses remain functional. Simple because they have barely around 10 types of proteins, so viral protein-protein interaction networks are not insightful. In this work we use fine-grained amino acid level information and their evolutionary characteristics obtained from large-scale genomic data to develop a statistical panel, towards the goal of developing quantitative descriptors for the biological complexity of viruses. Networks were constructed from pairwise covariation of amino acids and were statistically analyzed. Three differentiating factors arise: predominantly intra- vs inter-protein covariance relations, the nature of the node degree distribution and network density. Interestingly, the covariance relations were primarily intra-protein in avian influenza and inter-protein in HIV. The degree distributions showed two universality classes: a power-law with exponent -1 in HIV and avian-influenza, random behavior in human flu and dengue. The calculated covariance network density correlates well with the mortality strengths of viruses on the viral-Richter scale. These observations suggest the potential utility of the statistical metrics for describing the covariance patterns in viruses. Our host-virus interaction analysis point to the possibility that host proteins which can interact with multiple viral proteins may be responsible for shaping the inter-protein covariance relations. With the available data, it appears that network density might be a surrogate for the virus Richter scale, however the hypothesis needs a re-examination when large scale complete genome data for more viruses becomes available.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Birds / virology
  • Computational Biology / methods
  • Computational Biology / statistics & numerical data*
  • Dengue / genetics
  • Dengue / virology
  • Dengue Virus / classification
  • Dengue Virus / genetics*
  • Dengue Virus / metabolism
  • Evolution, Molecular
  • Gene Regulatory Networks
  • Genetic Variation
  • Genome, Viral*
  • HIV Infections / genetics
  • HIV Infections / virology
  • HIV-1 / classification
  • HIV-1 / genetics*
  • HIV-1 / metabolism
  • Hepatitis B / genetics
  • Hepatitis B / virology
  • Hepatitis B virus / classification
  • Hepatitis B virus / genetics*
  • Hepatitis B virus / metabolism
  • Host-Pathogen Interactions / genetics
  • Humans
  • Influenza A virus / classification
  • Influenza A virus / genetics*
  • Influenza A virus / metabolism
  • Influenza in Birds / genetics
  • Influenza in Birds / virology
  • Influenza, Human / genetics
  • Influenza, Human / virology
  • Phylogeny