Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Jul 23:1:205-20.
doi: 10.1093/gbe/evp023.

Exploring repetitive DNA landscapes using REPCLASS, a tool that automates the classification of transposable elements in eukaryotic genomes

Affiliations

Exploring repetitive DNA landscapes using REPCLASS, a tool that automates the classification of transposable elements in eukaryotic genomes

Cédric Feschotte et al. Genome Biol Evol. .

Abstract

Eukaryotic genomes contain large amount of repetitive DNA, most of which is derived from transposable elements (TEs). Progress has been made to develop computational tools for ab initio identification of repeat families, but there is an urgent need to develop tools to automate the annotation of TEs in genome sequences. Here we introduce REPCLASS, a tool that automates the classification of TE sequences. Using control repeat libraries, we show that the program can classify accurately virtually any known TE types. Combining REPCLASS to ab initio repeat finding in the genomes of Caenorhabditis elegans and Drosophila melanogaster allowed us to recover the contrasting TE landscape characteristic of these species. Unexpectedly, REPCLASS also uncovered several novel TE families in both genomes, augmenting the TE repertoire of these model species. When applied to the genomes of distant Caenorhabditis and Drosophila species, the approach revealed a remarkable conservation of TE composition profile within each genus, despite substantial interspecific covariations in genome size and in the number of TEs and TE families. Lastly, we applied REPCLASS to analyze 10 fungal genomes from a wide taxonomic range, most of which have not been analyzed for TE content previously. The results showed that TE diversity varies widely across the fungi "kingdom" and appears to positively correlate with genome size, in particular for DNA transposons. Together, these data validate REPCLASS as a powerful tool to explore the repetitive DNA landscapes of eukaryotes and to shed light onto the evolutionary forces shaping TE diversity and genome architecture.

Keywords: genome annotation; repeat classification; repetitive elements; transposable elements; transposons.

PubMed Disclaimer

Figures

F<sc>IG</sc>. 1.—
FIG. 1.—
Overview of the REPCLASS workflow. Subroutines are shown in italics in black boxes. Databases are shown in gray cylinders. Each input query sequence (typically a consensus) is analyzed by the three classification modules of REPCLASS. HOM: homology-based, searches similarity to known repeats deposited in Repbase using TBlastX and extract classification from keyword index file; STR: structure-based, several subroutines search for structural features characteristic of different group of TEs, such as terminal inverted repeats (TIR_search), LTRs (LTR_search), tRNA-like sequences (tRNAscan-SE), or polyA/SSRs (polyA/SSR_search); TSD: target site duplication, individual copies are extracted from the target genome sequence using BlastN and their flanking sequences are searched for TSD. If no TSD are found, the subroutine Helitron_scan is executed to look for structural features of Helitrons. The final step attempts to compare and integrate the results of the three modules, resulting in a tentative classification for each input sequence. For a complete description of the workflow and subroutines, see Results and Methods.
F<sc>IG</sc>. 2.—
FIG. 2.—
Validation of REPCLASS with Repbase libraries. Venn diagrams showing the number of consensus sequences in the Repbase Update (RU) library of (A) C. elegans (n = 116) and (B) D. melanogaster (n = 144) classified by the different modules of REPCLASS.
F<sc>IG</sc>. 3.—
FIG. 3.—
TE composition profiles generated by REPCLASS for (A) three Caenorhabditis species and (B) three Drosophila species. The profile depicts the percentage of families falling within one of the four TE subclasses (LTR retrotransposons, non-LTR retrotransposons, cut-and-paste DNA transposons, and Helitrons).
F<sc>IG.</sc> 4.—
FIG. 4.—
Relationship between genome size and the number of TE families classified by REPCLASS in 10 fungal genomes.
F<sc>IG</sc>. 5.—
FIG. 5.—
TE composition profiles generated by REPCLASS for 10 fungal genomes. The species are ranked by increasing genome size from left to right. For taxonomic information, see table 2 and supplementary table 1 (Supplementary Material online), and for a phylogenetic relationships, see Fitzpatrick et al. (2006).

Similar articles

Cited by

References

    1. Andrieu O, Fiston AS, Anxolabehere D, Quesneville H. Detection of transposable elements by their compositional bias. BMC Bioinform. 2004;5:94. - PMC - PubMed
    1. Bao Z, Eddy SR. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 2002;12:1269–1276. - PMC - PubMed
    1. Belancio VP, Hedges DJ, Deininger P. Mammalian non-LTR retrotransposons: for better or worse, in sickness and in health. Genome Res. 2008;18:343–358. - PubMed
    1. Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27:573–580. - PMC - PubMed
    1. Bergman CM, Quesneville H. Discovering and detecting transposable elements in genome sequences. Brief Bioinform. 2007;8:382–392. - PubMed