Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jan 9;46(1):11-24.
doi: 10.1093/nar/gkx1150.

Diversity-generating retroelements: natural variation, classification and evolution inferred from a large-scale genomic survey

Affiliations

Diversity-generating retroelements: natural variation, classification and evolution inferred from a large-scale genomic survey

Li Wu et al. Nucleic Acids Res. .

Abstract

Diversity-generating retroelements (DGRs) are novel genetic elements that use reverse transcription to generate vast numbers of sequence variants in specific target genes. Here, we present a detailed comparative bioinformatic analysis that depicts the landscape of DGR sequences in nature as represented by data in GenBank. Over 350 unique DGRs are identified, which together form a curated reference set of putatively functional DGRs. We classify target genes, variable repeats and DGR cassette architectures, and identify two new accessory genes. The great variability of target genes implies roles of DGRs in many undiscovered biological processes. There is much evidence for horizontal transfers of DGRs, and we identify lineages of DGRs that appear to have specialized properties. Because GenBank contains data from only 10% of described species, the compilation may not be wholly representative of DGRs present in nature. Indeed, many DGR subtypes are present only once in the set and DGRs of the candidate phylum radiation bacteria, and Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanoarchaeota, Nanohaloarchaea archaea, are exceptionally diverse in sequence, with little information available about functions of their target genes. Nonetheless, this study provides a detailed framework for classifying and studying DGRs as they are uncovered and studied in the future.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Structure and mechanism of the Bordetella phage DGR. The prototypic Bordetella phage DGR contains a target gene (mtd) with a variable repeat (VR), an accessory gene (avd), a template repeat (TR) and a reverse transcriptase gene (brt). During mutagenic retrohoming, a transcript of the TR is reverse transcribed, and the cDNA is integrated into the VR sequence of the target gene. During this process, A’s in the template are subject to mutagenesis by the incorporation of random nucleotides opposite the A in the template. This results in a new VR sequence that codes for a diversified phage tail protein. Additional sequence elements important for mutagenesis are the initation of mutagenic homing (IMH) sequence at the end of the VR, and a nonidentical repeat IMH* at the end of TR. A GC-rich inverted repeat is found downstream of IMH.
Figure 2.
Figure 2.
Major classes of VR sequences. Five major classes of VR sequences are shown in WebLogo format, and were generated from the VR alignments in Supplementary Data 1. Under each profile, the regions corresponding to aa variability are indicated by a black bar. Additional minor classes are shown in Supplementary Figure S1.
Figure 3.
Figure 3.
Protein domain structures of target genes. Schematics are shown for 39 domain variations of target genes, with one example of each (drawn to scale). Domain compositions are grouped by similarity; they are referred to as categories ‘a’ to ‘h’ in the text, and are displayed as (A-H) in the figure. Parentheses indicate the number of DGRs for each variation. The domains present are named according to the abbreviations used by CDD; a listing of the names of domains and their descriptions is in Supplementary Table S3. Asterisks over domains indicate the positions of diversification by mutagenic retrohoming. The abbreviation ‘(ext)’ indicates an extension of >250 aa with no identified motif. Codes for the domains are shown to the right, and correspond to abbreviations used in Supplementary Table S1.
Figure 4.
Figure 4.
Example TR and VR sequence alignments. Alignments are shown for the TR and VR DNA sequences and the aa sequence of the VR. The TR is not translated in vivo, but the aa corresponding to its unmutated sequence are shown for comparison with the VR sequences. The capitalized DNA sequences correspond to positions alignable between the TR and VR, while the lower-case DNA sequences are not alignable. Red DNA residues are TR-VR sequence differences consistent with A-to-N mutagenesis; green residues denote other differences compared to the TR. Yellow shading denotes aa in the VR that result from A-to-N mutations, while green shading shows other aa differences. The IMH sequences in VRs are indicated with purple shading, IMH* by blue shading and the inverted repeat by orange shading. (A) The Bordetella phage DGR. (B) A DGR with three target genes, each being diversified through mutagenic retrohoming. (C) An example of an indel in a VR sequence that is opposite AAC in the TR. (D) An extreme example of length difference between TR and VR. (E) An example of non-A-to-N substitutions and a frame shift.
Figure 5.
Figure 5.
Major architectures of genes in DGR cassettes. Architecture A1 has a core organization of TR-RT with different positions of the target gene (only one is shown). Architectures B1-B3 have core organizations of avd-TR-RT, avd-RT-TR and TR-(avd)-RT, respectively, with different arrangements of target genes. DGRs of Architectures C-E contain HRDC, MSL and CH1 accessory genes, respectively. Only one example is shown for Architectures A, C and E (A1, C1, E1); however, additional examples are in Supplementary Figure 3. Genes in parentheses indicate the reverse strand orientation of the gene. Numbers in parentheses indicatethe count of examples in the data set (Supplementary Table S1).
Figure 6.
Figure 6.
Phylogenetic tree of RTs and major VR classes. An unrooted maximum likelihood tree of RTs is shown with the VR class of each DGR indicated by color, and black dots indicating nodes with >75% bootstrap support. Colored arrows indicate the position of the Bordetella (red), Legionella (green) and Treponema (blue) DGRs. The four lineages identified in the text are shaded in gray, while the tan shading indicates DGRs from the CPR set (Supplementary Table S1). The order of taxa in the tree is the same order as in Supplementary Table S1, clockwise starting from the left boundary of the CPR DGRs.

Similar articles

Cited by

References

    1. Guo H., Arambula D., Ghosh P., Miller J.F.. Diversity-generating retroelements in phage and bacterial genomes. Microbiol. Spectr. 2014; 2:doi:10.1128/microbiolspec.MDNA3-0029-2014. - PMC - PubMed
    1. Medhekar B., Miller J.F.. Diversity-generating retroelements. Curr. Opin. Microbiol. 2007; 10:388–395. - PMC - PubMed
    1. Liu M., Deora R., Doulatov S.R., Gingery M., Eiserling F.A., Preston A., Maskell D.J., Simons R.W., Cotter P.A., Parkhill J. et al. . Reverse transcriptase-mediated tropism switching in Bordetella bacteriophage. Science. 2002; 295:2091–2094. - PubMed
    1. Mattoo S., Foreman-Wykert A.K., Cotter P.A., Miller J.F.. Mechanisms of Bordetella pathogenesis. Front. Biosci. 2001; 6:E168–E186. - PubMed
    1. Dai W., Hodes A., Hui W.H., Gingery M., Miller J.F., Zhou Z.H.. Three-dimensional structure of tropism-switching Bordetella bacteriophage. Proc. Natl. Acad. Sci. U.S.A. 2010; 107:4347–4352. - PMC - PubMed

Publication types