Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Sep 6;45(15):8943-8956.
doi: 10.1093/nar/gkx607.

Integrative and conjugative elements and their hosts: composition, distribution and organization

Affiliations
Free PMC article

Integrative and conjugative elements and their hosts: composition, distribution and organization

Jean Cury et al. Nucleic Acids Res. .
Free PMC article

Abstract

Conjugation of single-stranded DNA drives horizontal gene transfer between bacteria and was widely studied in conjugative plasmids. The organization and function of integrative and conjugative elements (ICE), even if they are more abundant, was only studied in a few model systems. Comparative genomics of ICE has been precluded by the difficulty in finding and delimiting these elements. Here, we present the results of a method that circumvents these problems by requiring only the identification of the conjugation genes and the species' pan-genome. We delimited 200 ICEs and this allowed the first large-scale characterization of these elements. We quantified the presence in ICEs of a wide set of functions associated with the biology of mobile genetic elements, including some that are typically associated with plasmids, such as partition and replication. Protein sequence similarity networks and phylogenetic analyses revealed that ICEs are structured in functional modules. Integrases and conjugation systems have different evolutionary histories, even if the gene repertoires of ICEs can be grouped in function of conjugation types. Our characterization of the composition and organization of ICEs paves the way for future functional and evolutionary analyses of their cargo genes, composed of a majority of unknown function genes.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Mating Pair Formation (MPF) types and procedure for ICE delimitation. (I) The phylogenetic tree displays the evolutionary relationships between MPF types as given by the VirB4 phylogeny. Most lineages are from diderms (green branches), the systems from monoderms (yellow branches, including Firmicutes, Actinobacteria, Archaea, and Tenericutes) being derived from these. MPFB (Bacteroides) and MPFC (Cyanobacteria) were absent from our data because not enough genomes were sequenced in those clades. The full green clades indicate systems that are typically found in Proteobacteria. The label TcpA indicates a clade that uses this protein as T4CP (an homologous ATPase from the typical T4CP – VirD4). In front of each tip of the tree, we indicate a non-exhaustive list of well-known conjugative elements (ICE (starting with ‘ICE’, or ‘(C)Tn’) or plasmid (starting with ‘p’)) for each MPF type. The phylogenetic tree was adapted from (15). (II) Scheme of the method. Boxes represent genes, circles represent chromosomes. (A) Genes encoding conjugative systems (Red) were detected in bacterial genomes using MacSyFinder. At this stage, this indicates the presence of an ICE that remains to be delimited. (B) We restricted the dataset of ICEs to those present in the 37 species for which we had at least four genomes (and a chromosomal conjugative system). We built the core genome (core-genes are represented in blue) of each species. The regions between two consecutive core-genes are defined as an interval. (C) The information on the conjugative system and the core-genes is used to delimit the chromosomal interval harboring the ICE. Hence, two core genes flank the ICE (in green). They define an upper bound for its limits. (D) Representation of the spot. The two families of core genes (green) define intervals in several genomes of the species (typically in all of them). The set of such intervals is called a spot and is here represented from the point of view of the interval that contains an ICE. We built the spot pan-genome, i.e. we identified the gene families present in the spot, and mapped this information on the interval with the conjugative system. Hence, the bottom layer of genes represents the genes of the interval with the ICE. The upper layers represent other genomes (each layer represents one genome), and the boxes correspond to genes that are orthologs of the genes in the interval with the ICE (genes lacking orthologs are omitted to simplify the representation). Finally, the manual delimitation is based on a visual representation of the spot including this information and the G+C content (see Supplementary Figure S1 and Materials and Methods).
Figure 2.
Figure 2.
ICE statistics as a function of the MPF type. Top. Distribution of the size of ICEs (in kb). The numbers above each violin plot represent the number of elements in each category. Bottom. Distribution of pairwise differences between the GC content of the ICE and that of its host. The violin plots represent the kernel density estimation of the underlying values. Here the violin plots are limited by the minimum and maximum values. ***P-value <0.001, Wilcoxon signed-rank test (rejecting the null hypothesis that the difference is equal to zero).
Figure 3.
Figure 3.
Representation of the wGRR-based network of ICEs. The nodes represent the ICEs and the edges link pairs of ICEs with wGRR score >5% (the thickness of the edge is proportional to the score). Left. Nodes are colored according to the MPF type. Darker nodes represent ICEs commonly used as experimental models, and are indicated by an arrow. Right. Nodes are colored according to the species of the host to highlight the distribution of the 37 species. The information of the species and type are in Supplementary Table S4. The position of the point has been determined by the Fruchterman-Reingold force-directed algorithm, as implemented in the NetworkX python library (spring layout).
Figure 4.
Figure 4.
Phylogenetic tree of tyrosine recombinases. The phylogenetic tree was built using 60 prophage integrases (labelled as ‘…_PHAGE’, brown), 11 integrases from pathogenicity islands (‘…_PAI’, mauve), 25 XerC,D,S or H (‘..._XER..’, greenish grey), 7 integron-integrases (‘.._intI’, greenish grey), and 134 integrases from ICEs (colored after the MPF type). The tree was built using Phylobayes and the values represent posterior probabilities support of the partition, with a cut off equal to 0.3, below which the nodes were collapsed because there is insufficient resolution (see Materials and Methods). The black arc denotes a clade with good support, which contains integrases from prophages, PAI and different MPF types, that is explicitly cited in the text.
Figure 5.
Figure 5.
Representation of EggNOG functional categories in ICEs relative to the host chromosome. The bars represent the number of times a given category is found more frequently in an ICE than in its host chromosome (N(fICE>fHOST)). The red dotted line represents the expected value under the null hypothesis, where a category is in similar proportion in ICE and its host's chromosome. Bars marked as NS represent a lack of significant difference (P > 0.05, Binomial test with 199 trials and expected value of 0.5), whereas the others are all significantly different (P < 0.05, same test). Note that there are 199 trials because one of the 200 ICEs could not be types and was thus excluded, see text. Error bars represent 95% confidence interval computed with 1000 bootstraps. ‘Not in EggNOG’ represents the class of genes that didn’t match any EggNOG profile.
Figure 6.
Figure 6.
Average organization of ICEs. Each row represents an MPF type and has a length proportional to the mean size of the ICEs of the corresponding type. Colors represent different classes of functions. The black line represents the proportion of genes with a predicted function per bin. The classes of functions correspond to those described in detail in Supplementary Figure S9A-B-C-F. More precisely, Conjugation includes MPF-associated genes, and the relaxase. Defense includes antibiotic resistance genes, restriction modification, solitary methylases. Metabolism includes genes annotated by EggNOG as such. Recombination includes tyrosine and serine recombinases, and DDE transposases. Stability includes replication, partition and entry exclusion systems. Bar heights are proportional to the proportion of genes of a given function at that position among the genes with a predicted function. The width of each bin is 1kb.
Figure 7.
Figure 7.
Proportion of ICEs with flanking tRNAs on either side of the ICE.

Similar articles

Cited by

References

    1. Ochman H., Lawrence J.G., Groisman E.A.. Lateral gene transfer and the nature of bacterial innovation. Nature. 2000; 405:299–304. - PubMed
    1. Popa O., Dagan T.. Trends and barriers to lateral gene transfer in prokaryotes. Curr. Opin. Microbiol. 2011; 14:615–623. - PubMed
    1. Polz M.F., Alm E.J., Hanage W.P.. Horizontal gene transfer and the evolution of bacterial and archaeal population structure. Trends Genet. 2013; 29:170–175. - PMC - PubMed
    1. Medini D., Donati C., Tettelin H., Masignani V., Rappuoli R.. The microbial pan-genome. Curr. Opin. Genet. Dev. 2005; 15:589–594. - PubMed
    1. Davies J., Davies D.. Origins and evolution of antibiotic resistance. Microbiol. Mol. Biol. Rev.: MMBR. 2010; 74:417–433. - PMC - PubMed

MeSH terms