Statistics and Patterns of Occurrence of Simple Tandem Repeats in SARS-CoV-1 and SARS-CoV-2 Genomic Data

Data Brief. 2021 Jun:36:107057. doi: 10.1016/j.dib.2021.107057. Epub 2021 Apr 21.

Abstract

The data presented in this article is related to the research article entitled "Developing an ultra-efficient microsatellite discoverer to find structural differences between SARS-CoV-1 and Covid-19" [Naghibzadeh et al. 2020]. Simple tandem repeats (microsatellites, STR) are extracted and investigated across all viral families from four main viral realms. An ultra-efficient and reliable software, which is recently developed by the authors and published in the above-mentioned article, is used for extracting STRs. The analysis is done for k-mer tandem repeats where k varies from one to seven. In particular the frequency of trimer STRs is shown to be low in RNA viruses compared with DNA viruses. Special attention is paid to seven zoonotic viruses from family Coronaviridae which caused several severe human crises during last two decades including MERS, SARS 2003 and Covid-19.

Keywords: RNA data analysis; SARS-CoV-1; SARS-CoV-2; Tandem repeats.