Background: Retrotransposons are key players in the evolution of eukaryotic genomes. Moreover, it is now known that some retrotransposon classes, like the abundant and plant-specific Sireviruses, have intriguingly distinctive host preferences. Yet, it is largely unknown if this bias is supported by different genome structures.
Results: We performed sensitive comparative analysis of the genomes of a large set of Ty1/copia retrotransposons. We discovered that Sireviruses are unique among Pseudoviridae in that they constitute an ancient genus characterized by vastly divergent members, which however contain highly conserved motifs in key non-coding regions: multiple polypurine tract (PPT) copies cluster upstream of the 3' long terminal repeat (3'LTR), of which the terminal PPT tethers to a distinctive attachment site and is flanked by a precisely positioned inverted repeat. Their LTRs possess a novel type of repeated motif (RM) defined by its exceptionally high copy number, symmetry and core CGG-CCG signature. These RM boxes form CpG islands and lie a short distance upstream of a conserved promoter region thus hinting towards regulatory functions. Intriguingly, in the envelope-containing Sireviruses additional boxes cluster at the 5' vicinity of the envelope. The 5'LTR/internal domain junction and a polyC-rich integrase signal are also highly conserved domains of the Sirevirus genome.
Conclusions: Our comparative analysis of retrotransposon genomes using advanced in silico methods highlighted the unique genome organization of Sireviruses. Their structure may dictate a life cycle with different regulation and transmission strategy compared to other Pseudoviridae, which may contribute towards their pattern of distribution within and across plants.