Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 3 (2), vex016
eCollection

A Novel Viral Lineage Distantly Related to Herpesviruses Discovered Within Fish Genome Sequence Data

Affiliations

A Novel Viral Lineage Distantly Related to Herpesviruses Discovered Within Fish Genome Sequence Data

Amr Aswad et al. Virus Evol.

Abstract

Pathogenic viruses represent a small fraction of viral diversity, and emerging diseases are frequently the result of cross-species transmissions. Therefore, we need to develop high-throughput techniques to investigate a broader range of viral biodiversity across a greater number of species. This is especially important in the context of new practices in agriculture that have arisen to tackle the challenges of global food security, including the rising number of marine and freshwater species that are used in aquaculture. In this study, we demonstrate the utility of combining evolutionary approaches with bioinformatics to mine non-viral genome data for viruses, by adapting methods from paleovirology. We report the discovery of a new lineage of dsDNA viruses that are associated with at least fifteen different species of fish. This approach also enabled us to simultaneously identify sequences that likely represent endogenous viral elements, which we experimentally confirmed in commercial salmon samples. Moreover, genomic analysis revealed that the endogenous sequences have co-opted PiggyBac-like transposable elements, possibly as a mechanism of intragenomic proliferation. The identification of novel viruses from genome data shows that our approach has applications in genomics, virology, and the development of best practices for aquaculture and farming.

Keywords: endogenous viral element; herpesvirus; metagenomics; paleovirology.

Figures

Figure 1.
Figure 1.
Both panels depict midpoint rooted Bayesian phylogenetic trees reconstructed from an alignment of DNA polymerase. The branch lengths represent the number of substitutions per site and the numbers at each node represent posterior probabilities >85. (A) Posterior probabilities are expressed as values out of 100. As well as the sequences under investigation (annotated in purple) the 233 amino acid alignment included viruses representing eight dsDNA virus families, as well as delta, zeta and epsilon fish DNA polymerases. (B) An extended 2,904-nucleotide alignment of the new sequences without other viral groups intended to obtain more robust support for the topology within the clade. The clades are annotated according to the fish species in whose genome the viral-like data was identified. The inset cladogram shows the relationships between these fish hosts, drawn manually based on the phylogeny in Near et al. (2012).
Figure 2.
Figure 2.
A schematic diagram depicting the blocks of co-linear sequence similarity among the viral contigs, which are drawn relative to the Salmo salar sequence. Homologous blocks across the different sequences are indicated by the same color, and shown below the line if a co-linear block is found in reverse orientation. The alignment is centered at the midpoint of DNA polymerase, which is located in slightly different places for each sequence within the co-linear block. Repetitive elements >200 bp are indicated as arrow blocks above the representation of each contig (excluding simple repeats). *In the case of Boleophthalmus pectinirostris, MAUVE was unable to identify the collinear block containing the DNA polymerase ORFs due to the presence of large insertions not found in other contigs.
Figure 3.
Figure 3.
A selection of four of the viral sequences detected in fish genomes are represented here with detailed annotation with predicted open reading frames (ORFs) represented as boxes. ORFs in the forward and reverse orientation are indicated above and below the line, respectively. ORFs without any detectable similarity to known proteins are not labeled, and those with similarity to unnamed proteins are only indicated by their ORF ID. The color-coded key indicates the viral family of the best hit for each predicted ORF. ORFs are completely filled with the corresponding color according to the taxonomic group of the most similar protein, but BLAST similarity was always partial.
Figure 4.
Figure 4.
(A) A coverage graph across the length of the salmon contig. The x-axis represents the log coverage, with a horizontal bar indicating the mean at ∼3,000×. Short regions of zero coverage are indicated by a dash along the x-axis, all of which are bridged by read pairs indicating that the contig is not erroneously assembled. (B) The graph shown indicates the number of read pairs for each insert size. The peaks at 180, 300 and 600 correspond to the known sizes of libraries used in the sequencing project. Only high quality, well-aligned and paired reads were included in the coverage count. (C) The figure depicts BLASTn hits of contig AGKD01000001.1 (inner ring) against salmon chromosomes (outer ring), showing only those over 1 kb long (and up to 9 kb) and excluding hits to the highly repetitive terminal ends. All hits are between 80% and 100% identical at the nucleotide level. The colours represent segments of the query sequence AGKD01000001.1 for clarity. The salmon chromosomes are drawn to scale with the values at tick marks representing Mb. AGKD01000001.1 is drawn much larger as it is only 194,200 bp long and would not be visible at the chromosomal scale.
Figure 5.
Figure 5.
(A) Maximum likelihood phylogenetic tree reconstructed from a 953 nucleotide alignment of a conserved region of PiggyBac-like genes in fish genomes. Numbers at nodes represent percentage results of non-parametric bootstrapping with 1,000 replicates. (B) The PiggyBac-like elements identified in the salmon virus-like contig AGKD01000001.1 are stylistically showing the major genomic features, including the characteristic TTAA motif flanking the elements and a 13-bp terminal inverted repeat. One 750-bp intron is shown in purple, but we cannot rule out the presence of others. The schematic diagrams are not drawn to scale.

Similar articles

See all similar articles

Cited by 7 articles

See all "Cited by" articles

References

    1. Aswad A., Katzourakis A. (2012) ‘Paleovirology and Virally Derived Immunity’, Trends in Ecology & Evolution, 27: 627–36. - PubMed
    1. Aswad A., Katzourakis A. (2014) ‘The First Endogenous Herpesvirus, Identified in the Tarsier Genome, and Novel Sequences from Primate Rhadinoviruses and Lymphocryptoviruses’, PLoS Genetics, 10: e1004332.. - PMC - PubMed
    1. Baudry C. et al. (2009) ‘PiggyMac, A Domesticated piggybac Transposase Involved in Programmed Genome Rearrangements in the Ciliate Paramecium tetraurelia’, Genes & Development, 23: 2478–83. - PMC - PubMed
    1. Belyi V. A., Levine A. J., Skalka A. M. (2010a) ‘Sequences from Ancestral Single-Stranded DNA Viruses in Vertebrate Genomes: The Parvoviridae and Circoviridae are More Than 40 to 50 Million Years Old’, Journal of Virology, 84: 12458–62. - PMC - PubMed
    1. Belyi V. A., Levine A. J., Skalka A. M. (2010b) ‘Unexpected Inheritance: Multiple Integrations of Ancient Bornavirus and Ebolavirus/Marburgvirus Sequences in Vertebrate Genomes’, PLoS Pathogens, 6: e1001030.. - PMC - PubMed
Feedback