Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 7 (7), e1002146

Widespread Endogenization of Genome Sequences of Non-Retroviral RNA Viruses Into Plant Genomes


Widespread Endogenization of Genome Sequences of Non-Retroviral RNA Viruses Into Plant Genomes

Sotaro Chiba et al. PLoS Pathog.


Non-retroviral RNA virus sequences (NRVSs) have been found in the chromosomes of vertebrates and fungi, but not plants. Here we report similarly endogenized NRVSs derived from plus-, negative-, and double-stranded RNA viruses in plant chromosomes. These sequences were found by searching public genomic sequence databases, and, importantly, most NRVSs were subsequently detected by direct molecular analyses of plant DNAs. The most widespread NRVSs were related to the coat protein (CP) genes of the family Partitiviridae which have bisegmented dsRNA genomes, and included plant- and fungus-infecting members. The CP of a novel fungal virus (Rosellinia necatrix partitivirus 2, RnPV2) had the greatest sequence similarity to Arabidopsis thaliana ILR2, which is thought to regulate the activities of the phytohormone auxin, indole-3-acetic acid (IAA). Furthermore, partitivirus CP-like sequences much more closely related to plant partitiviruses than to RnPV2 were identified in a wide range of plant species. In addition, the nucleocapsid protein genes of cytorhabdoviruses and varicosaviruses were found in species of over 9 plant families, including Brassicaceae and Solanaceae. A replicase-like sequence of a betaflexivirus was identified in the cucumber genome. The pattern of occurrence of NRVSs and the phylogenetic analyses of NRVSs and related viruses indicate that multiple independent integrations into many plant lineages may have occurred. For example, one of the NRVSs was retained in Ar. thaliana but not in Ar. lyrata or other related Camelina species, whereas another NRVS displayed the reverse pattern. Our study has shown that single- and double-stranded RNA viral sequences are widespread in plant genomes, and shows the potential of genome integrated NRVSs to contribute to resolve unclear phylogenetic relationships of plant species.

Conflict of interest statement

The authors have declared that no competing interests exist.


Figure 1
Figure 1. ILR2 (PCLS1) homologs from members of the family Brassicaceae.
(A) Schematic representation of RnPV2 CP-related ILR2 genes from Arabidopsis-related species. Green boxes refer to the coding regions of ILR2 homologs, while orange and blue thick arrows indicate those of cellular genes. Ar. thaliana Col-0, No-0, C24 and Shokei have long versions of ILR2, while those of the other Ar. thaliana ecotypes and Ar. lyrata have large deletions at the 5′-terminal portion. ILR2 homologs of Arabidopsis and closely related genera reside on the orthologous position. These plant homologs were most closely related to the CP gene of a fungal partitivirus, RnPV2. Symbols referring to mutations are shown at the bottom: waved line, major deletion; dashed line, undetermined sequence; open triangle, nucleotide insertion; filled triangle, small deletion (<30 nt); asterisk, internal stop codon; F, frame-shift; filled diamond, transcription start site; open diamond, poly(A) addition site. These symbols were utilized in this and subsequent figures. (B) Genomic PCR analysis of ILR2. The top and middle panels show amplification patterns with two primer sets (PC-1 and PC-2; PC-1 and At-1R). Primer positions and sequences are shown in Figure 1A and Table S3. A primer set, At-IRS-FW (ITS-F) and At-IRS-RV (ITS-R) , was used for amplification of the complete ribosomal internal transcribed spacer (ITS) regions 1 and 2 including the 5.8S rDNA. (C) Southern blotting of plant species in different families. Ten microgram of Eco RI-digested genomic DNA (per lane), except for that from Ar. thaliana Col-0 (2.5 µg/lane), was probed with a DIG-labeled ILR2 (top panel) or ITS DNA fragment (bottom panel) derived from Ar. thaliana Col-0. (D) Genomic PCR analysis of the ILR2-flanking region. PCR fragments were amplified by a primer set (At-1F and At-1R) on ILR2-carrying genomic DNAs from Ar. thaliana, Ar. lyrata, Cap. bursa-pastoris, and O. korshinskyi, and ILR2-non-carrying DNAs from Cru. lasiocarpa, Sis. irio, and B. rapa.
Figure 2
Figure 2. PCLS2 and PCLS3 homologs from members of the genus Arabidopsis.
(A) Diagrams of the plant genome map containing PCLS2 and PCLS3 from Arabidopsis-related species. See Figure 1 legend for explanation of symbols. AtPCLS2 and AllPCL3 showed the highest levels of similarity to the CP of plant partitiviruses, Raphanus sativus cryptic virus 2 (RSCV2) and Fragaria chiloensis cryptic virus (FCCV), respectively. (B) Genomic PCR analysis of PCLS2 and PCLS3. PCLS2 homologs were amplified using primer sets PC2-1 and PC2-2 (top panel) and PC2-1 and At-2 (second panel). These primers are specific for AtPCLS2 except for At-2, which corresponds to an F-box protein gene (At4g14103). The third and fourth panels show amplification patterns of PCLS3 with primer sets PC3-1 and PC3-2 or Al-3 and PC3-2, respectively. A primer set, At-IRS-FW and At-IRS-RV (ITS-F and ITS-R for abbreviation, see the Figure 1 legend) were used in this and subsequent figures (Figures 3, 5, S1, S3) for amplification of the complete ITS regions. Primers' positions are shown by small arrows in A, while their sequences are shown in Table S3.
Figure 3
Figure 3. PCLS4 and PCLS5 homologs from members of the families Solanaceae and Brassicaceae.
(A) Diagrams of the structures of B. rapa PCLS4 (BrPCLS4), and PCLS5s from Sol. phureja (SpPCLS5) and from B. rapa (BrPCLS5). PCLS4 shows the highest similarities to carrot cryptic virus 1 (CaCV1) CP, while PCLS5s exhibit the greatest sequence similarities to the CP of another plant partitivirus, Raphanus sativus cryptic virus 1 (RSCV1). (B) Genomic PCR analysis of PCLS4 and PCLS5. Genomic DNA from members of the families Brassicaceae and Solanaceae shown on the top of gels were used for amplification of PCLSs. Primers used were: PC4-1 and PC4-2 specific for BrPCLS4 (top panel); PC5a-1 and PC5a-2 specific for BrPCLS5 (second panel); PC5b-1 and PC5b-2 specific for SpPCLS5 (third panel); PC5b-1 and SP-5 specific for SpPCLS5 and PUX_4 (fourth panel); ITS-F and ITS-F specific for the ITS region (bottom panel). (C) Genomic Southern blotting of PCLS4 and PCLS5. EcoRI-digested genomic DNA isolated from various plants shown at the top of the blots were hybridized with different DIG-labeled probes specific for BrPCLS4 (top panel), BrPCLS5 (second panel), B. rapa ITS (third panel) Sol. tuberosum PCLS5 (fourth panel) and Sol. tuberosum ITS (bottom panel). Migration positions of DNA size standards are shown at the left.
Figure 4
Figure 4. Molecular phylogenetic analysis of partitivirus CPs and plant PCLSs.
A phylogenetic tree was generated based on an alignment (see Figure S2) of the entire region of partitivirus CP-related sequences. Analyzed sequences were from 7 fungal partitiviruses (shown in red), 10 plant partitiviruses (in blue), 1 F. pratensis EST-derived sequence (shown in purple), 4 accessions of Ar. thaliana, and 16 other plant species (in green) (See Tables 1 and S4 for their descriptions). The assembled sequence from the F. pratensis ESTs in the database is believed to be of plant-infecting partitivirus origin because the library contains EST entries of RdRp sequences and some had interrupted poly(A) tails typical of a partitiviral mRNA. Viruses analyzed phylogenetically are: Rosellinia necatrix partitivirus 2, RnPV2; Sclerotinia sclerotiorum partitivirus S, SsPV-S; Chondrostereum purpureum cryptic virus, CPCV; Raphanus sativus cryptic virus 1, RSCV1; white clover cryptic virus 1, WCCV1; vicia cryptic virus, VCV; carrot cryptic virus 1, CaCV1; beet cryptic virus 1, BCV1; Amasya cherry disease associated partitivirus, ACD-PV; cherry chlorotic rusty spot-associated partitivirus, CCRS-PV; Heterobasidion RNA virus 3, HetRV3; Flammulina velutipes browning virus, FvBV; beet cryptic virus 2, BCV2; Raphanus sativus cryptic virus 3, RSCV3; Fragaria chiloensis cryptic virus, FCCV; rose cryptic virus 1, RoCV; Raphanus sativus cryptic virus 2, RSCV2. Note that RSCV1 CP gene and RSCV1 dsRNA3, BCV2 dsRNA2 and 3, and RSCV2 dsRNA2 and 3 are assumed to be from two independent viruses although the same virus name was assigned to the segments in the database. Numbers at the branches show aLRT values using an SH-like calculation (only values greater than 0.5 are shown). The scale bar represents the relative genetic distance (number of substitutions per nucleotide).
Figure 5
Figure 5. Negative-strand RNA virus-related sequences (RNLSs) from plant nuclear genomes.
(A) Genome organization of a varicosavirus, lettuce big-vein associated virus (LBVaV) and a cytorhabdovirus, lettuce necrotic yellows virus (LNYV) . While LBVaV and LNYV have a bipartite and a monopartite genome architecture, respectively, both viruses share similarities in terminal sequence features such as leader sequences (le) and trailer sequence (tr), genome expression strategy and sequences in encoded proteins (e.g., CP vs. N and L vs. L). (B) Schematic representation of RNLSs and their flanking regions. RNLS found in the genome sequence database of B. rapa (BrRNLS1) is shown to match that of CP from LBVaV. Another RNLS from N. tabacum (NtRNLS2) showed the greatest similarity to the LNYV-N protein. (C, E) Genomic PCR analysis of RNLS1 and RNLS2. Template genomic DNAs from plant species shown on the top of the gel were used to amplify RNLS1 (C, top panel), RNLS2 (E, top panel) or ribosomal RNA ITS regions (C and E, bottom panels). Primer pairs, RN1a-1 and RN1a-2, RN2-1 and RN2-2, and ITS-F and ITS-R were used to amplify RNLS1, RNLS2, and the ITS regions, respectively. Amplified DNA fragments were electrophoresed in 1.0% agarose gel in TAE. (D, F) Southern blot analyses of plant species in different families. The same DNA preparations as for Figure 1 were used for detection of RNLS1 (D) and RNLS2 (F) in which DIG-labeled DNA fragments spanning BrRNLS1, NtRNLS2, and N. tabacum ITS served as probes, respectively. See Figure 3C for hybridization with a B. rapa ITS DNA probe.
Figure 6
Figure 6. Phylogenetic analyses of the nucleocapsid protein sequences of rhabdoviruses and RNLSs.
Phylogenetic relation of nucleocapsid proteins of negative strand RNA viruses and plant RNLSs. A phylogenetic tree was constructed using PhyML 3.0 based on the multiple amino acid sequence alignments of entire regions of rhabdovirus nucleocapsid protein-related sequences shown in Figure S5. Plant RNLSs, N (CP) proteins from negative-strand RNA viruses, and EST-derived sequences are shown in green, blue and purple, respectively. Viruses analyzed phylogenetically are: tobacco stunt virus, TStV; lettuce big-vein associated virus, LBVaV; lettuce yellow mottle virus, LYMoV; lettuce necrotic yellows virus, LNYV; northern cereal mosaic virus, NCMV; potato yellow dwarf virus, PYDV; orchid fleck virus, OFV; sonchus yellow net virus, SYNV. Numbers at the branches show aLRT values using an SH-like calculation (only values greater than 0.5 are shown).
Figure 7
Figure 7. Plant genome sequence related to positive-strand RNA virus.
(A) Chromosomal position of the flexivirus replicase-like sequence (FRLS) found in the cucumber ‘Chinese long’ inbred line 9930 and the genome structure of a positive-sense RNA virus, citrus leaf blotch virus (CLBV) . A sequence related to the 5′ terminal half of the CLBV genome (CsFRLS1) is detected in scaffold 507. Genes for small potential ORFs (Cucsa 038520 and 038540) reside near CsFRLS1 as well as a retrotransposon-like sequence (shown by thick black lines). Three short sequences identical to CsFRLS1 are found in the GSS database (NCBI) from a different cucumber line, ‘Borszczagowski’ B10 ( (shown by dashed bars above CsFRLS1 in red). Functional domains of the CLBV replicase polyprotein are indicated in ocher: Met, methyltransferase; AlkB, Fe(II)/2OG-dependent dioxygenase superfamily domain; peptidase; Hel, RNA helicase; RdRp, RNA-dependent RNA polymerase. (B) Detection of CsFRLS1 from cucumber line by genomic PCR. See Materials and Methods for DNA isolation and PCR reaction. Template genomic DNA was prepared from the cucumber cultivar ‘Borszczagowski’ line B10 (top panel) and Citrullus lanatus (watermelon) (bottom panel). Primers (FR1-1 to FR1-6, FR1-6*, CS-1F, and CS-1R) used are shown on the top of the panel. The positions of the primers are shown by arrows below CsFRLS1 in A, except primer pairs used for amplification of the ITS region (ITS-F and ITS-R) (Table S3). (C) Phylogenetic analysis of CsFRLS1. CsFRLS1 and corresponding amino acid sequences from plant flexiviruses including members of the genera Citrivirus, Carlavirus, Foveavirus, Vitivirus, Capillovirus, Trichovirus and Potexvirus, were aligned using MAFFT program (Figure S6). The alignment was then utilized to generate a phylogram. Numbers at the branches show aLRT values using an SH-like calculation (only values greater than 0.5 are shown).
Figure 8
Figure 8. Horizontal gene transfer of genome sequences of non-retroviral RNA viruses into plant genomes.
The cladogram was created based on previous reports by The Angiosperm Phylogeny Group (31) , Oyama et al. , Udvard et al. and Phytozome ( Plants whose integrated non-retroviral RNA virus sequences (NRVSs) were analyzed molecularly in this study are shown in red. Integrations of non-retroviral RNA virus sequences, PCLSs, RNLSs, and FRLS are shown next to the plant species retaining them. Presumed integration times of NRSVs are indicated by dots on the nodes. Numbers within the genome-integrated NRVSs refer to subgroups (possible different virus origins) of PCLS (yellow column), RNLS (blue column) and FRLS (pink column) (Tables 1, 2, S4, S5). Numbers are placed within or beneath the symbolized morphologies of viruses that are thought to be the source of integrations (spherical for partitivirus, PCLS; bacilliformed for rhabdovirus, RNLS; flexuous for betaflexivirus, FRLS).

Similar articles

See all similar articles

Cited by 58 PubMed Central articles

See all "Cited by" articles


    1. Gorbalenya AE. Host-related sequences in RNA viral genomes. Semin Virol. 1992;3:359–371.
    1. Meyers G, Rumenapf T, Thiel HJ. Ubiquitin in a togavirus. Nature. 1989;341:491. - PubMed
    1. Mayo MA, Jolly CA. The 5′-terminal sequence of potato leafroll virus RNA: evidence of recombination between virus and host RNA. J Gen Virol. 1991;72:2591–2595. - PubMed
    1. Agranovsky AA, Boyko VP, Karasev AV, Koonin EV, Dolja VV. Putative 65 kDa protein of beet yellows closterovirus is a homologue of HSP70 heat shock proteins. J Mol Biol. 1991;217:603–610. - PubMed
    1. Dolja VV, Kreuze JF, Valkonen JP. Comparative and functional genomics of closteroviruses. Virus Res. 2006;117:38–51. - PubMed

Publication types