Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Aug;78(15):5288-96.
doi: 10.1128/AEM.00564-12. Epub 2012 May 25.

Oral spirochetes implicated in dental diseases are widespread in normal human subjects and carry extremely diverse integron gene cassettes

Affiliations

Oral spirochetes implicated in dental diseases are widespread in normal human subjects and carry extremely diverse integron gene cassettes

Yu-Wei Wu et al. Appl Environ Microbiol. 2012 Aug.

Abstract

The NIH Human Microbiome Project (HMP) has produced several hundred metagenomic data sets, allowing studies of the many functional elements in human-associated microbial communities. Here, we survey the distribution of oral spirochetes implicated in dental diseases in normal human individuals, using recombination sites associated with the chromosomal integron in Treponema genomes, taking advantage of the multiple copies of the integron recombination sites (repeats) in the genomes, and using a targeted assembly approach that we have developed. We find that integron-containing Treponema species are present in ∼80% of the normal human subjects included in the HMP. Further, we are able to de novo assemble the integron gene cassettes using our constrained assembly approach, which employs a unique application of the de Bruijn graph assembly information; most of these cassette genes were not assembled in whole-metagenome assemblies and could not be identified by mapping sequencing reads onto the known reference Treponema genomes due to the dynamic nature of integron gene cassettes. Our study significantly enriches the gene pool known to be carried by Treponema chromosomal integrons, totaling 826 (598 97% nonredundant) genes. We characterize the functions of these gene cassettes: many of these genes have unknown functions. The integron gene cassette arrays found in the human microbiome are extraordinarily dynamic, with different microbial communities sharing only a small number of common genes.

PubMed Disclaimer

Figures

Fig 1
Fig 1
(A) The neighbor-joining tree of the eight representative sequences of the T. denticola chromosomal integron recombination sites. The sequences are named by the starting position of the sites in the genome. The multiple alignment was prepared using ClustalW, and the neighbor-joining tree was prepared using the jalview tool. (B) Predicted structure of one of the representative sequences, attC1870410, which has the typical structure of an integron recombination site, with two stems and one conserved unpaired G. The structure was predicted by RNAscf (3), software that performs simultaneous alignment and folding of RNAs, using the eight representative sequences as input.
Fig 2
Fig 2
A diagram of the constrained assembly approach. (A) Paired-end and singleton reads from a metagenomic data set. (B) Assembly of all reads using SOAPdenovo, to generate contigs and a de Bruijn graph that connects the contigs. (C) Identification of contigs that consist of integron recombination repeats (shown as orange bars) and search for paths that start and end at a contig with repeats, using a depth-first search algorithm. At any intermediate node, the process will sort the coverage of all contigs connected by its outgoing edges and begin searching from the highest one. The starting and ending contig could be the same contig. (D) Validation of the assembled sequences (the paths) by read mapping and discarding of the paths that are not supported by reads (e.g., the middle sequence in the figure is discarded). (E) Identification of the integron repeats and their exact locations in the assembled sequences. Prediction of genes using FragGeneScan. Output sequences are between two repeats (attC sites) and consist of three or fewer genes. (F) Retrieval of the genes from sequences that pass all criteria.
Fig 3
Fig 3
The number of integron genes discovered from simulated metagenomic data sets using different k-mer settings. The x axis lists the k-mers, while the y axis shows the total number of genes assembled. We generated three data sets with different coverages (10×, 20×, and 31×) and applied our constrained assembly method to these data sets. Lines indicate the gene numbers found, and dashed lines are the numbers of genes that were identified solely at the contig level (i.e., genes on the contigs that are bounded between two integron recombination sites).
Fig 4
Fig 4
Annotation of a contig from sample SRS022602 (SRS022602_Baylor_scaffold_118781) of 3,131 bp. Red diamonds indicate the two repeats identified in this contig with similarity to the attC sites in the T. denticola chromosomal integron, and the three gray boxes indicate the predicted genes. The first gene (1–407) shares 46% sequence identity and 66% similarity along 97% of the gene with a protein (YP_001868417.1) from the Nostoc punctiforme PCC 73102 genome (a nitrogen-fixing cyanobacterium). The second gene (503–1639) shares 31% identity (53% similarity) along 99% of the gene with a protein (ADE86468.1) from Rhodobacter capsulatus SB 1003 (a purple, nonsulfur photosynthetic bacterium). The third gene (1743–3131) shares 24% identity and 45% similarity, covering 88% of the gene, with a protein (ZP_04160697.1) from Bacillus mycoides Rock3-17 (a Gram-positive, nonmotile soil bacterium); this gene also shares 24% sequence identity and 46% similarity (covering 65% of the gene) with a protein (YP_002158281.1, nuclease-related domain family protein, NERD) from Vibrio fischeri MJ11 (20).
Fig 5
Fig 5
Taxonomic assignments of the integron genes by MEGAN. The numbers following clade names are the numbers of genes assigned to that taxonomic rank, not including the genes assigned to the taxa below that rank (for example, there are 63 genes assigned to T. denticola species, 49 genes assigned to strain ATCC 35405, and 40 genes assigned to strain F0402; in total, 138 genes can be assigned to the T. denticola species).
Fig 6
Fig 6
Sharing of gene cassettes among the samples. In this map, rows are the samples and columns are the genes found in the integron gene cassettes, clustered at 70% sequence identity at the amino acid level (by CD-HIT). A red cell means that the corresponding gene exists in the corresponding sample. The naming convention for the samples is SRS ID_individual ID_female/male_body site_location. Note that some samples are from the same individual (e.g., two samples collected from a female with an individual ID of 761143397 are highlighted with blue triangles in the figure, and another two samples from individual 160158126 are highlighted in orange diamonds).

Similar articles

Cited by

References

    1. Altschul SF, et al. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389–3402 - PMC - PubMed
    1. Arumugam M, et al. 2011. Enterotypes of the human gut microbiome. Nature 473:174–180 - PMC - PubMed
    1. Bafna V, Tang H, Zhang S. 2006. Consensus folding of unaligned RNA sequences revisited. J. Comput. Biol. 13:283–295 - PubMed
    1. Boucher Y, Labbate M, Koenig JE, Stokes HW. 2007. Integrons: mobilizable platforms that promote genetic diversity in bacteria. Trends Microbiol. 15:301–309 - PubMed
    1. Cambray G, Guerout AM, Mazel D. 2010. Integrons. Annu. Rev. Genet. 44:141–166 - PubMed

Publication types

LinkOut - more resources