Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul 12;95(15):e0029421.
doi: 10.1128/JVI.00294-21. Epub 2021 Jul 12.

Host-Virus Chimeric Events in SARS-CoV-2-Infected Cells Are Infrequent and Artifactual

Affiliations

Host-Virus Chimeric Events in SARS-CoV-2-Infected Cells Are Infrequent and Artifactual

Bingyu Yan et al. J Virol. .

Abstract

The pathogenic mechanisms underlying severe SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) infection remain largely unelucidated. High-throughput sequencing technologies that capture genome and transcriptome information are key approaches to gain detailed mechanistic insights from infected cells. These techniques readily detect both pathogen- and host-derived sequences, providing a means of studying host-pathogen interactions. Recent studies have reported the presence of host-virus chimeric (HVC) RNA in transcriptome sequencing (RNA-seq) data from SARS-CoV-2-infected cells and interpreted these findings as evidence of viral integration in the human genome as a potential pathogenic mechanism. Since SARS-CoV-2 is a positive-sense RNA virus that replicates in the cytoplasm, it does not have a nuclear phase in its life cycle. Thus, it is biologically unlikely to be in a location where splicing events could result in genome integration. Therefore, we investigated the biological authenticity of HVC events. In contrast to true biological events like mRNA splicing and genome rearrangement events, which generate reproducible chimeric sequencing fragments across different biological isolates, we found that HVC events across >100 RNA-seq libraries from patients with coronavirus disease 2019 (COVID-19) and infected cell lines were highly irreproducible. RNA-seq library preparation is inherently error prone due to random template switching during reverse transcription of RNA to cDNA. By counting chimeric events observed when constructing an RNA-seq library from human RNA and spiked-in RNA from an unrelated species, such as the fruit fly, we estimated that ∼1% of RNA-seq reads are artifactually chimeric. In SARS-CoV-2 RNA-seq, we found that the frequency of HVC events was, in fact, not greater than this background "noise." Finally, we developed a novel experimental approach to enrich SARS-CoV-2 sequences from bulk RNA of infected cells. This method enriched viral sequences but did not enrich HVC events, suggesting that the majority of HVC events are, in all likelihood, artifacts of library construction. In conclusion, our findings indicate that HVC events observed in RNA-sequencing libraries from SARS-CoV-2-infected cells are extremely rare and are likely artifacts arising from random template switching of reverse transcriptase and/or sequence alignment errors. Therefore, the observed HVC events do not support SARS-CoV-2 fusion to cellular genes and/or integration into human genomes. IMPORTANCE The pathogenic mechanisms underlying SARS-CoV-2, the virus responsible for COVID-19, are not fully understood. In particular, relatively little is known about the reasons some individuals develop life-threatening or persistent COVID-19. Recent studies identified host-virus chimeric (HVC) reads in RNA-sequencing data from SARS-CoV-2-infected cells and suggested that HVC events support potential "human genome invasion" and "integration" by SARS-CoV-2. This suggestion has fueled concerns about the long-term effects of current mRNA vaccines that incorporate elements of the viral genome. SARS-CoV-2 is a positive-sense, single-stranded RNA virus that does not encode a reverse transcriptase and does not include a nuclear phase in its life cycle, so some doubts have rightfully been expressed regarding the authenticity of HVCs and the role played by endogenous retrotransposons in this phenomenon. Thus, it is important to independently authenticate these HVC events. Here, we provide several lines of evidence suggesting that the observed HVC events are likely artifactual.

Keywords: COVID-19; RNA sequencing; SARS-CoV-2; chimeric reads; host-virus fusion; sequencing reads.

PubMed Disclaimer

Figures

FIG 1
FIG 1
HVC events are detectable in RNA-seq from SARS-CoV-2-infected cells but infrequently in samples from COVID-19 patients. (A) Schematic presentation of RNA-sequencing data analysis pipeline. (B) Viral reads in the indicated SARS-CoV-2-infected or other virally infected cells as a proportion of the total reads mapped to the chimeric genome. (C) HVC reads in the indicated SARS-CoV-2-infected or other virally infected cells as a proportion of the total reads mapped to the virus genome. (D) SARS-CoV-2 genome coverage based on reads mapping perfectly to the virus genome (top) or to the viral segments of HVC events (bottom). (E) Violin plots showing the expression of all human genes with or without HVC events in the indicated infected cells. *, P < 0.05, and ****, P < 0.0001, by Kruskal-Wallis and FDR correction. (F) Dot plots showing the expression of all human genes in SARS-CoV-2-infected A549-ACE2 cells ordered by gene expression level. Genes with or without HVC events are highlighted with red and blue, respectively. See Tables 1 and 2 for the sources of data in this figure. TPM, transcripts per million.
FIG 2
FIG 2
HVC events are not reproducible and have frequencies comparable to those of artifactual chimeric events. (A, B) Representative Venn diagrams (A) and cumulative data (B) comparing known splicing, novel splicing, and HVC events across independent studies (see Table 1 for the list of independent studies used here). The accession numbers of data from representative studies used in panel A are GSE147507 and PRJNA665581/SRP285334 for Calu-3 cells, GSE147507 and GSE151803 for patient samples, GSE147507 and GSE159191 for A549 cells, and GSE147507 and GSE154613 for A549-ACE2 cells. (C) Histograms showing the numbers of reads spanning junctions of the indicated events. (D) The fractions of spiked-in Drosophila RNA detected to be chimeras with human RNA. Data are from the data set with accession number PRJNA311567. (E) Violin plots showing expression of all human genes with or without human-Drosophila chimeric events. TPM, transcripts per million. (F) Distribution of genomic features in the human segment of human-Drosophila chimeric events. (G) Distribution of genomic features in the host segment of human–SARS-CoV-2 HVC events. *, P < 0.05, **, P < 0.01, and ****, P < 0.0001, by Wilcoxon test (B, E) and FDR correction (F and G).
FIG 3
FIG 3
Experimental enrichment for viral-RNA-containing fragments does not enrich HVC events. (A) Schematic presentation of viral-RNA enrichment from infected host cells. Cellular RNA from infected cells comprises host RNA, viral RNA, and presumably, any fusion RNA between virus and host. A pool of oligonucleotide probes that are specific to SARS-CoV-2 were used in a series of reverse transcription, in vitro transcription, and PCR amplification steps to amplify viral RNAs and potential host-virus (1) or virus-host (2) chimeras (see Materials and Methods). (B) Expression of N protein in control and virus-enriched (1 or 2) samples using N1 and N2 qPCR probes recommended by the CDC. (C) Viral reads in the indicated libraries from SARS-CoV-2-infected Calu-3 cells as a proportion of the total reads mapped to the chimeric genome. (D) HVC reads in the indicated libraries from SARS-CoV-2-infected Calu-3 cells as a proportion of the total reads mapped to the SARS-CoV-2 genome. (E) Distribution of genomic features in the human segment of HVC events detected after enrichment for viral-RNA-containing transcripts. *, P < 0.05, by Wilcoxon test. (F) Venn diagram comparing HVC events in Calu-3 cells from the data shown in Fig. 2A with postenrichment HVC events.

Update of

Similar articles

Cited by

References

    1. Kazemian M, Ren M, Lin JX, Liao W, Spolski R, Leonard WJ. 2015. Possible human papillomavirus 38 contamination of endometrial cancer RNA sequencing samples in the Cancer Genome Atlas Database. J Virol 89:8967–8973. 10.1128/JVI.00822-15. - DOI - PMC - PubMed
    1. Kazemian M, Ren M, Lin JX, Liao W, Spolski R, Leonard WJ. 2015. Comprehensive assembly of novel transcripts from unmapped human RNA-Seq data and their association with cancer. Mol Syst Biol 11:826. 10.15252/msb.156172. - DOI - PMC - PubMed
    1. McBride AA, Warburton A. 2017. The role of integration in oncogenic progression of HPV-associated cancers. PLoS Pathog 13:e1006211. 10.1371/journal.ppat.1006211. - DOI - PMC - PubMed
    1. Mani SKK, Yan B, Cui Z, Sun J, Utturkar S, Foca A, Fares N, Durantel D, Lanman N, Merle P, Kazemian M, Andrisani O. 2020. Restoration of RNA helicase DDX5 suppresses hepatitis B virus (HBV) biosynthesis and Wnt signaling in HBV-related hepatocellular carcinoma. Theranostics 10:10957–10972. 10.7150/thno.49629. - DOI - PMC - PubMed
    1. Wang L, Laing J, Yan B, Zhou H, Ke L, Wang C, Narita Y, Zhang Z, Olson M, Afzali B, Zhao B, Kazemian M. 2020. Epstein-Barr virus episome physically interacts with active regions of the host genome in lymphoblastoid cells. J Virol 94:e01390-20. 10.1128/JVI.01390-20. - DOI - PMC - PubMed

Publication types