Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Dec 10;9:595.
doi: 10.1186/1471-2164-9-595.

Genome Wide Survey, Discovery and Evolution of Repetitive Elements in Three Entamoeba Species

Affiliations
Free PMC article

Genome Wide Survey, Discovery and Evolution of Repetitive Elements in Three Entamoeba Species

Hernan Lorenzi et al. BMC Genomics. .
Free PMC article

Abstract

Background: Identification and mapping of repetitive elements is a key step for accurate gene prediction and overall structural annotation of genomes. During the assembly and annotation of three highly repetitive amoeba genomes, Entamoeba histolytica, Entamoeba dispar, and Entamoeba invadens, we performed comparative sequence analysis to identify and map all class I and class II transposable elements in their sequences.

Results: Here, we report the identification of two novel Entamoeba-specific repeats: ERE1 and ERE2; ERE1 is spread across the three genomes and associated with different repeats in a species-specific manner, while ERE2 is unique to E. histolytica. We also report the identification of two novel subfamilies of LINE and SINE retrotransposons in E. dispar and provide evidence for how the different LINE and SINE subfamilies evolved in these species. Additionally, we found a putative transposase-coding gene in E. histolytica and E. dispar related to the mariner transposon Hydargos from E. invadens. The distribution of transposable elements in these genomes is markedly skewed with a tendency of forming clusters. More than 70% of the three genomes have a repeat density below their corresponding average value indicating that transposable elements are not evenly distributed. We show that repeats and repeat-clusters are found at syntenic break points between E. histolytica and E. dispar and hence, could work as recombination hot spots promoting genome rearrangements.

Conclusion: The mapping of all transposable elements found in these parasites shows that repeat coverage is up to three times higher than previously reported. LINE, ERE1 and mariner elements were present in the common ancestor to the three Entamoeba species while ERE2 was likely acquired by E. histolytica after its separation from E. dispar. We demonstrate that E. histolytica and E. dispar share their entire repertoire of LINE and SINE retrotransposons and that Eh_SINE3/Ed_SINE1 originated as a chimeric SINE from Eh/Ed_SINE2 and Eh_SINE1/Ed_SINE3. Our work shows that transposable elements are organized in clusters, frequently found at syntenic break points providing insights into their contribution to chromosome instability and therefore, to genomic variation and speciation in these parasites.

Figures

Figure 1
Figure 1
Identification and characterization of ERE1 in Entamoeba sp. A) Reconstruction of Eh_ERE1 consensus sequence from multiple fragmented copies scattered along the E. histolytica assembly. Green boxes, flanking Eh_ERE1 terminal inverted repeats (TIR); black boxes, Eh_ERE1 core region; yellow boxes, single Eh_ERE1 ORF where the arrow indicates sense of transcription; white boxes, E. histolytica scaffolds. Numbers represent coordinates within scaffolds. GenBank accession numbers of scaffolds are indicated on the left. B) Multiple alignment of the consensus protein sequences coded by Eh_ERE1, Ed_ERE1 and Ei_ERE1. Black-shaded letters, identical residues; gray-shaded letters, conservative changes. C) Syntenic regions from E. histolytica (top) and E. dispar (bottom) showing an example of Eh_ERE1 transposition. White boxes, protein coding genes; black box, Eh_ERE1; red boxes, LINEs; gray areas, regions of similarity. GenBank locus tags are indicated above or below genes. Scaffold GenBank accessions are shown on the left. Features on the forward or reverse strand are displayed above or below the scaffolds, respectively.
Figure 2
Figure 2
Characterization of ERE2 in Entamoeba histolytica. A) Schematic representation of ERE2. Yellow box, ERE2 open reading frame; white box, ERE2 core region; green boxes, imperfect terminal inverted repeats (TIR); stripped boxes, target site duplications (TSD). B) Multiple sequence alignment of the ERE2 5' and 3' imperfect inverted repeats and insertion sites. Bold letters represent inverted repeats; asterisks denote nucleotides conserved in both TIRs; target site duplications are shown underlined. C) Example of Eh_ERE2 transposition in a syntenic region from E. histolytica (top) and E. dispar (bottom). White boxes, protein coding genes; black box, Eh_ERE2; red boxes, LINEs. Orthologous pairs of genes are denoted by gray shading. Scaffold GenBank accession numbers are indicated on the left.
Figure 3
Figure 3
Identification of mariner-related elements in E. histolytica and E. dispar. A) Phylogenetic position of Eh_mariner, Ed_mariner and Ei_Hydargos (highlighted in blue) in the IS630/Tc1/mariner superfamily. Mariner subfamilies and related transposons (Tc1, ItmD37E, and plant mariner-like elements) are shown. Elements are identified by host name and GeneInfo Identifier (gi). Branches supported by less than 500 bootstrap replicates are depicted as thin black lines; branches having bootstrap values between 500 and 750 are shown as bold grey lines; branches with values above 750 are represented as bold black lines. B) ClustalX alignment of the transposase domain found in Eh_mariner and Ed_mariner together with three closely related transposases. Amino acids conserved in at least 3 sequences are colored in black. Asterisks denote the three conserved glutamic residues typical of this type of transposases. Parentheses indicate number of residues between conserved blocks.
Figure 4
Figure 4
Characterization and phylogenetic analysis of LINE elements in Entamoeba sp. A) Multiple sequence alignment of the 5' and 3' ends of Ei_LINE and insertion sites. The 5' and 3' termini are highlighted in bold. Target site duplications (TSD) are underlined. Ei_LINEs from contigs AANW02000355, AANW02001294, AANW02001046 and AANW02001402 are truncated and lack either the 5' or 3' end of the element. Genomic coordinates for Ei_LINEs excluding ISD are: AANW02000718 (41,801–46,844), AANW02000022 (38,839–43,869), AANW02000355 (5,491–6,819), AANW02001294 (486–1,665), AANW02001046 (2,758-1,949) and AANW02001402 (3,418-2,447). GenBank accessions of E. invadens contigs are indicated on the left. B) Phylogenetic analysis of the reverse transcriptase sequences from all identified Entamoeba LINEs compared to reverse transcriptases derived from different families of retroelements and retroviruses. Thin black lines, branches with bootstrap values below 500; bold grey lines, branches containing bootstrap values between 500 and 750; bold black lines, branches supported by bootstrap values above 750. Nodes containing Entamoeba LINEs are highlighted in blue.
Figure 5
Figure 5
Evolutionary analysis of Eh_SINE3/Ed_SINE1 in E. histolytica and E. dispar. A) and B) All-vs-all dot-plot analyses of the first (A) or last (B) 240 bp of Eh_SINE1, Eh_SINE2, Eh_SINE3 and Ed_SINE1. Each dot represents at least 60 identical nucleotides between sequences using a sliding window 100 bp wide. Numbers above or at the left of each dot-plot represent nucleotide positions for each sequence. Comparisons between Eh_SINE1 and either Eh_SINE3 or Ed_SINE1 are highlighted in red, while plots between Eh_SINE2 and either Eh_SINE3 and Ed_SINE1 are highlighted in green. C) and D) Phylogenetic trees showing the relationships between the first 240 bp (B) or last 240 bp (C) of Eh_SINE3/Ed_SINE1 and Eh_SINE1, Eh_SINE2, Eh_LINE1 and Eh_LINE2. Branches supported by bootstrap values between 500 and 750 or above 750 are depicted in grey or black, respectively.
Figure 6
Figure 6
Phylogenetic analysis of SINE elements in E. histolytica and E. dispar. A) Schematic representation of an E. dispar locus containing a copy of Ed_SINE3 (green boxes) interrupted by the insertion of an Ed_SINE2 (white box) generating target site duplications (TSD, black boxes). The diagonal stripped box represents an Ed_LINE2 located at the end of the scaffold. Scaffold GenBank accession is indicated on the left. B) Phylogenetic analysis of the three SINE families found in E. histolytica and E. dispar. All tree nodes have a bootstrap value of 1000 (1000 replicates).
Figure 7
Figure 7
Distribution analysis of transposable elements in Entamoeba sp. A) Distribution of inter-repeat distances in E. histolytica (Eh, red), E. dispar (Ed, blue) and E. invadens (Ei, green). Mean and median values for each species are indicated by vertical lines. Distances were grouped using 100 bp bins. B) Diagram representing the number of simultaneous occurrences of all possible pairs of different repeats within repeat-clusters for each species. Line thickness connecting two different repeats is proportional to the number of times a repeat pair is part of a cluster C) Schematic diagram showing the association among Eh_SINE3/Ed_SINE1, LINE1 and ERE1 in E. histolytica and E. dispar. This three-component unit composed by LINE1, Ed_SINE1 and ERE1 is found amplified several times in E. dispar. GenBank accessions are indicated on the left of the figure. Numbers denote scaffold coordinates. Red boxes, Eh/Ed_LINE2; grey boxes, Eh/Ed_SINE2; yellow boxes, Eh_SINE3/Ed_SINE1; black boxes, Eh/Ed_LINE1; stripped boxes, Eh/Ed_ERE1.
Figure 8
Figure 8
Repeat densities in Entamoeba sp. Proportion of genomic regions with repeat densities below (green) or above (yellow) the average density value for each genome. Repeat densities are expressed as number of repeats every 10 Kb. Scaffolds were positioned into one of the two categories based on their repeat coverage.
Figure 9
Figure 9
Example of repeat clusters at a syntenic break between E. histolytica and E. dispar. Black boxes, repetitive elements; blue boxes, E. histolytica genes; red boxes, E. dispar genes. GenBank accession numbers for each scaffold are shown above lines. Orthologous gene pairs between E. histolytica and E. dispar are connected by gray areas. Percent identity plots between nucleotide sequences DS571146 and DS550750 (positions 6 Kb – 19 Kb) and nucleotide sequences DS571146 and DS550750 (positions 32 Kb – 51 Kb) are shown above and below scaffolds respectively. Numbers indicate scaffold coordinates in Kb. Vertical gray lines depict the locations where synteny disappears.

Similar articles

See all similar articles

Cited by 25 articles

See all "Cited by" articles

References

    1. Que X, Reed SL. Cysteine proteinases and the pathogenesis of amebiasis. Clin Microbiol Rev. 2000;13:196–206. - PMC - PubMed
    1. Bakre AA, Rawal K, Ramaswamy R, Bhattacharya A, Bhattacharya S. The LINEs and SINEs of Entamoeba histolytica: comparative analysis and genomic distribution. Exp Parasitol. 2005;110:207–213. - PubMed
    1. Pritham EJ, Feschotte C, Wessler SR. Unexpected diversity and differential success of DNA transposons in four species of entamoeba protozoans. Mol Biol Evol. 2005;22:1751–1763. - PubMed
    1. Sharma R, Bagchi A, Bhattacharya A, Bhattacharya S. Characterization of a retrotransposon-like element from Entamoeba histolytica. Mol Biochem Parasitol. 2001;116:45–53. - PubMed
    1. Shire AM, Ackers JP. SINE elements of Entamoeba dispar. Mol Biochem Parasitol. 2007;152:47–52. - PubMed

Publication types

LinkOut - more resources

Feedback