Feature-extraction and analysis based on spatial distribution of amino acids for SARS-CoV-2 Protein sequences

Comput Biol Med. 2022 Feb:141:105024. doi: 10.1016/j.compbiomed.2021.105024. Epub 2021 Nov 10.

Abstract

Background and objective: The world is currently facing a global emergency due to COVID-19, which requires immediate strategies to strengthen healthcare facilities and prevent further deaths. To achieve effective remedies and solutions, research on different aspects, including the genomic and proteomic level characterizations of SARS-CoV-2, are critical. In this work, the spatial representation/composition and distribution frequency of 20 amino acids across the primary protein sequences of SARS-CoV-2 were examined according to different parameters.

Method: To identify the spatial distribution of amino acids over the primary protein sequences of SARS-CoV-2, the Hurst exponent and Shannon entropy were applied as parameters to fetch the autocorrelation and amount of information over the spatial representations. The frequency distribution of each amino acid over the protein sequences was also evaluated. In the case of a one-dimensional sequence, the Hurst exponent (HE) was utilized due to its linear relationship with the fractal dimension (D), i.e. D+HE=2, to characterize fractality. Moreover, binary Shannon entropy was considered to measure the uncertainty in a binary sequence then further applied to calculate amino acid conservation in the primary protein sequences.

Results and conclusion: Fourteen (14) SARS-CoV protein sequences were evaluated and compared with 105 SARS-CoV-2 proteins. The simulation results demonstrate the differences in the collected information about the amino acid spatial distribution in the SARS-CoV-2 and SARS-CoV proteins, enabling researchers to distinguish between the two types of CoV. The spatial arrangement of amino acids also reveals similarities and dissimilarities among the important structural proteins, E, M, N and S, which is pivotal to establish an evolutionary tree with other CoV strains.

Keywords: Amino acid; Frequency distribution; Hurst exponent; SARS-CoV-2; Shannon entropy.

MeSH terms

  • Amino Acid Sequence
  • Amino Acids
  • COVID-19*
  • Humans
  • Proteomics
  • SARS-CoV-2*

Substances

  • Amino Acids