Diversity-generating retroelements (DGRs) are novel genetic elements that use reverse transcription to generate vast numbers of sequence variants in specific target genes. Here, we present a detailed comparative bioinformatic analysis that depicts the landscape of DGR sequences in nature as represented by data in GenBank. Over 350 unique DGRs are identified, which together form a curated reference set of putatively functional DGRs. We classify target genes, variable repeats and DGR cassette architectures, and identify two new accessory genes. The great variability of target genes implies roles of DGRs in many undiscovered biological processes. There is much evidence for horizontal transfers of DGRs, and we identify lineages of DGRs that appear to have specialized properties. Because GenBank contains data from only 10% of described species, the compilation may not be wholly representative of DGRs present in nature. Indeed, many DGR subtypes are present only once in the set and DGRs of the candidate phylum radiation bacteria, and Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanoarchaeota, Nanohaloarchaea archaea, are exceptionally diverse in sequence, with little information available about functions of their target genes. Nonetheless, this study provides a detailed framework for classifying and studying DGRs as they are uncovered and studied in the future.
© The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.