The use of knowledge management tools in viroinformatics. Example study of a highly conserved sequence motif in Nsp3 of SARS-CoV-2 as a therapeutic target

Comput Biol Med. 2020 Oct:125:103963. doi: 10.1016/j.compbiomed.2020.103963. Epub 2020 Aug 13.

Abstract

Knowledge management tools that assist in systematic review and exploration of scientific knowledge generally are of obvious potential importance in evidence based medicine in general, but also to the design of therapeutics based on the protein subsequences and fold motifs of virus proteins as considered here. Rapid access to bundles (clusters) of related elements of knowledge gathered from diverse sources on the Internet and from growing knowledge repositories seem particularly helpful when exploring less obvious therapeutic targets in viruses (for which knowledge new to the researcher is important), and when using the following concept. Subsequences of amino acid residue sequences of proteins that are conserved across strains and species are (a) more likely to be important targets and (b) less likely to exhibit escape mutations that would make them resistant to vaccines and therapeutic agents. However, the terms "conserved" and even "highly conserved" used by authors are matters of degree, depending on how distant from SARS-CoV-2 they wished to go in comparing other sequences. The binding site to the human ACE2 protein as virus receptor and human antibody CR3022 binding site on the spike glycoprotein are rather variable by the criteria used in the present and preceding studies. To look for more strongly conserved targets, open reading frames of SARS-CoV-2 were examined for extremely highly conserved regions, meaning recognizable across many viruses and organisms. Most prominent is a motif found in SARS-CoV-2 non-structural protein 3 (Nsp3). It relates to a fold called type called the macro domain and has remarkably wide distribution across organisms including humans with significant homologies involving three especially conserved subsequences (a) VVVNAANVYLKHGGGVAGALNK, (b) LHVVGPNVNKG, and (c) PLLSAGIFG. Careful study of the variations of these and of the more variable sequences between and around them might provide a finer "scalpel" to ensure inhibition of a vital function of the virus without impairing the functions of related host macro domains.

Keywords: Bioinformatics; COVID-19; Conservation; Coronavirus; Knowledge management; Macro domain; Mutations; SARS-CoV-2; Therapeutic; X domain.

MeSH terms

  • Amino Acid Sequence
  • Artificial Intelligence*
  • Betacoronavirus
  • Binding Sites
  • COVID-19
  • Computational Biology / methods*
  • Conserved Sequence / genetics*
  • Coronavirus Infections* / drug therapy
  • Coronavirus Infections* / virology
  • Coronavirus Papain-Like Proteases
  • Drug Development
  • Humans
  • Pandemics*
  • Pneumonia, Viral* / drug therapy
  • Pneumonia, Viral* / virology
  • SARS-CoV-2
  • Viral Nonstructural Proteins* / antagonists & inhibitors
  • Viral Nonstructural Proteins* / chemistry
  • Viral Nonstructural Proteins* / genetics

Substances

  • Viral Nonstructural Proteins
  • Coronavirus Papain-Like Proteases
  • papain-like protease, SARS-CoV-2