Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 85 (19), 9863-76

Widespread Endogenization of Densoviruses and Parvoviruses in Animal and Human Genomes

Affiliations

Widespread Endogenization of Densoviruses and Parvoviruses in Animal and Human Genomes

Huiquan Liu et al. J Virol.

Abstract

Parvoviruses infect humans and a broad range of animals, from mammals to crustaceans, and generally are associated with a variety of acute and chronic diseases. However, many others cause persistent infections and are not known to be associated with any disease. Viral persistence is likely related to the ability to integrate into the chromosomal DNA and to establish a latent infection. However, there is little evidence for genome integration of parvoviral DNA except for Adeno-associated virus (AAV). Here we performed a systematic search for homologs of parvoviral proteins in publicly available eukaryotic genome databases followed by experimental verification and phylogenetic analysis. We conclude that parvoviruses have frequently invaded the germ lines of diverse animal species, including mammals, fishes, birds, tunicates, arthropods, and flatworms. The identification of orthologous endogenous parvovirus sequences in the genomes of humans and other mammals suggests that parvoviruses have coexisted with mammals for at least 98 million years. Furthermore, some of the endogenized parvoviral genes were expressed in eukaryotic organisms, suggesting that these viral genes are also functional in the host genomes. Our findings may provide novel insights into parvovirus biology, host interactions, and evolution.

Figures

Fig. 1.
Fig. 1.
Schematic representation of some PRDs and their most related viruses. Arrowhead boxes indicate viral-like genes (red, nonstructural proteins; blue, structural proteins). Green rectangular boxes indicate transposable elements. Colored sectors connect corresponding homologous regions, and the percent amino acid identity scores are indicated. Wavy and vertical lines within boxes indicate sequences containing frameshifts and stop codons compared with viral genes, respectively. Black arrowheads indicate primers which were used to amplify the junctions between PRD and host sequences. See Table S1 in the supplemental material for PCR primer sequences and their chromosomal locations.
Fig. 2.
Fig. 2.
Schematic representation and alignment of a PRD in the African savanna elephant genome and Bovine adeno-associated virus (BAAV). Sequence alignment of 5′ untranslated regions (1) and predicted amino acid sequences (2) of BAAV Rep and PRD are shown. Conserved nucleotides (amino acid residues) are shaded in orange. Green interrupted rectangular boxes indicate transposable elements, the length of which was not drawn to scale. Colored sectors connect corresponding homologous regions; percent nucleotide or amino acid identities are indicated.
Fig. 3.
Fig. 3.
PCR using animal total DNAs. PCR products were fractionated by gel electrophoresis on 1% agarose gels and stained with ethidium bromide. Marker, DNA marker DL 2000. Arrowheads indicate bands of the expected sizes in lanes with more than one band. The sequences of bands of the expected sizes from guinea pig, horse, D. sechellia NS, D. persimilis, mouse, pig, rabbit, rat NSCP-2, and cat were deposited in GenBank under accession numbers HM469386 to HM469391 and HM989956 to HM989958.
Fig. 4.
Fig. 4.
Phylogenetic trees of exogenous parvoviruses and animal PRDs. (A and B) NS and CP trees of vertebrate parvoviruses and their related PRDs, respectively. The trees were rooted with the densovirus-like Penaeus monodon hepatopancreatic parvovirus. The node of the orthologous PRD clade is marked by a red diamond, and the relevant hosts are indicated by a blue arc in the middle of tree (B). (C and D) NS and CP trees of arthropod parvoviruses and their related PRDs, respectively. The trees were rooted with the parvovirus Aleutian mink disease virus. Only P values of the approximate likelihood ratios (SH test) of >0.5 (50%) are indicated. All scale bars correspond to 0.5 amino acid substitution per site. The PRD branches are printed in red. The taxon names of PRDs possibly derived from recent integration events are shaded in green (see details in text). Animals belonging to the same group are indicated to the right. The sequence accession number is given for each sequence.
Fig. 4.
Fig. 4.
Phylogenetic trees of exogenous parvoviruses and animal PRDs. (A and B) NS and CP trees of vertebrate parvoviruses and their related PRDs, respectively. The trees were rooted with the densovirus-like Penaeus monodon hepatopancreatic parvovirus. The node of the orthologous PRD clade is marked by a red diamond, and the relevant hosts are indicated by a blue arc in the middle of tree (B). (C and D) NS and CP trees of arthropod parvoviruses and their related PRDs, respectively. The trees were rooted with the parvovirus Aleutian mink disease virus. Only P values of the approximate likelihood ratios (SH test) of >0.5 (50%) are indicated. All scale bars correspond to 0.5 amino acid substitution per site. The PRD branches are printed in red. The taxon names of PRDs possibly derived from recent integration events are shaded in green (see details in text). Animals belonging to the same group are indicated to the right. The sequence accession number is given for each sequence.
Fig. 4.
Fig. 4.
Phylogenetic trees of exogenous parvoviruses and animal PRDs. (A and B) NS and CP trees of vertebrate parvoviruses and their related PRDs, respectively. The trees were rooted with the densovirus-like Penaeus monodon hepatopancreatic parvovirus. The node of the orthologous PRD clade is marked by a red diamond, and the relevant hosts are indicated by a blue arc in the middle of tree (B). (C and D) NS and CP trees of arthropod parvoviruses and their related PRDs, respectively. The trees were rooted with the parvovirus Aleutian mink disease virus. Only P values of the approximate likelihood ratios (SH test) of >0.5 (50%) are indicated. All scale bars correspond to 0.5 amino acid substitution per site. The PRD branches are printed in red. The taxon names of PRDs possibly derived from recent integration events are shaded in green (see details in text). Animals belonging to the same group are indicated to the right. The sequence accession number is given for each sequence.
Fig. 4.
Fig. 4.
Phylogenetic trees of exogenous parvoviruses and animal PRDs. (A and B) NS and CP trees of vertebrate parvoviruses and their related PRDs, respectively. The trees were rooted with the densovirus-like Penaeus monodon hepatopancreatic parvovirus. The node of the orthologous PRD clade is marked by a red diamond, and the relevant hosts are indicated by a blue arc in the middle of tree (B). (C and D) NS and CP trees of arthropod parvoviruses and their related PRDs, respectively. The trees were rooted with the parvovirus Aleutian mink disease virus. Only P values of the approximate likelihood ratios (SH test) of >0.5 (50%) are indicated. All scale bars correspond to 0.5 amino acid substitution per site. The PRD branches are printed in red. The taxon names of PRDs possibly derived from recent integration events are shaded in green (see details in text). Animals belonging to the same group are indicated to the right. The sequence accession number is given for each sequence.
Fig. 5.
Fig. 5.
Identification of a syntenic PRD locus in mammal genomes. (A) Schematic representation of the human limbin gene structure. Vertical blue bars indicate putative exons; arcs indicate putative introns. The region of the PRD and flanking sequence is marked with a red rectangular box. (B) The PRD and flanking sequence in human genome were aligned with the orthologous regions of other mammals using BLASTn. Colored bars indicate the similarity level between human sequences and other mammal sequences as measured by BLAST scores. Asterisks indicate that sequences are truncated due to sequencing gaps. Note that African savanna elephant and mouse did not contain PRDs. See Table S2 in the supplemental material for accession numbers and positions of mammal sequences used for analysis. SINE, short interspersed repetitive element. (C) Phylogenetic tree of orthologous PRD regions in mammal genomes. The phylogenetic tree was constructed by the neighbor-joining method using the maximum composite likelihood substitution model with the pairwise deletion option in MEGA4 (45). The bootstrap probability is indicated for each interior branch. The scale bar indicates the number of nucleotide substitutions per site. The tree is midpoint rooted, and its topology is consistent with the phylogeny of mammals (9, 39). (D) Alignment of putative amino acid sequences of human PRD and its best-matched virus. The default color scheme for ClustalW alignment in the Jalview program was used. Percent amino acid identity is indicated. The red asterisks and triangle indicate predicted stop codons and frameshift sites in human PRD, respectively. NHP_AAV_CP, capsid protein of nonhuman primate adeno-associated virus (AAO88189.1).
Fig. 6.
Fig. 6.
Schematic representation of some PRDs and their expressed sequences. Colored boxes with arrowheads and swallowtails indicate ORFs with and without start codons, respectively. Red, nonstructural proteins; blue, structural proteins. Green rectangular boxes indicate transposable elements. Wavy and vertical lines within boxes indicate sequences containing frameshifts and stop codons compared with viral genes, respectively. Similar regions of expressed sequences are identified, and the percent nucleotide identity with PRDs is indicated. Note that the full-length sequence corresponding to a parvovirus-related cDNA (EW905967) containing repeated sequences was not identified in the genomic database but occurred in the Trace-WGS database (G), suggesting that some trace records containing expressed PRDs remain to be assembled into genomic contigs. In addition, two parvovirus-related cDNAs (DY223558 and DY224604) containing repeated sequences did not have corresponding full-length sequences in either the genomic database or the Trace-WGS database (H), and the parvovirus-related cDNAs contained rearranged structures relative to genomic sequences (I), which could possibly be due to incomplete genomic sequencing or because sequencing did not cover some regions containing PRDs.
Fig. 7.
Fig. 7.
Structural and expression analysis of an endogenous densovirus in the genome of Drosophila sechellia. Colored arrowhead boxes indicate virus-like ORFs. Red, nonstructural proteins; blue, structural proteins; other colors, hypothetical proteins. Gray sectors connect corresponding homologous regions detected by BLASTp; percent amino acid identities are indicated. Black arrows indicate primers which were used to amplify and validate the connections. The sequence of the transposable element-densovirus-like gene boundary is shown above the diagram at the left. Blue bars represent the matched regions of expressed sequences of the endogenous densovirus; arcs indicate introns.

Similar articles

See all similar articles

Cited by 35 articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback