Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Oct 9;15(1):882.
doi: 10.1186/1471-2164-15-882.

An evolutionary analysis of genome expansion and pathogenicity in Escherichia coli

Affiliations

An evolutionary analysis of genome expansion and pathogenicity in Escherichia coli

Jon Bohlin et al. BMC Genomics. .

Abstract

Background: There are several studies describing loss of genes through reductive evolution in microbes, but how selective forces are associated with genome expansion due to horizontal gene transfer (HGT) has not received similar attention. The aim of this study was therefore to examine how selective pressures influence genome expansion in 53 fully sequenced and assembled Escherichia coli strains. We also explored potential connections between genome expansion and the attainment of virulence factors. This was performed using estimations of several genomic parameters such as AT content, genomic drift (measured using relative entropy), genome size and estimated HGT size, which were subsequently compared to analogous parameters computed from the core genome consisting of 1729 genes common to the 53 E. coli strains. Moreover, we analyzed how selective pressures (quantified using relative entropy and dN/dS), acting on the E. coli core genome, influenced lineage and phylogroup formation.

Results: Hierarchical clustering of dS and dN estimations from the E. coli core genome resulted in phylogenetic trees with topologies in agreement with known E. coli taxonomy and phylogroups. High values of dS, compared to dN, indicate that the E. coli core genome has been subjected to substantial purifying selection over time; significantly more than the non-core part of the genome (p<0.001). This is further supported by a linear association between strain-wise dS and dN values (β = 26.94 ± 0.44, R2~0.98, p<0.001). The non-core part of the genome was also significantly more AT-rich (p<0.001) than the core genome and E. coli genome size correlated with estimated HGT size (p<0.001). In addition, genome size (p<0.001), AT content (p<0.001) as well as estimated HGT size (p<0.005) were all associated with the presence of virulence factors, suggesting that pathogenicity traits in E. coli are largely attained through HGT. No associations were found between selective pressures operating on the E. coli core genome, as estimated using relative entropy, and genome size (p~0.98).

Conclusions: On a larger time frame, genome expansion in E. coli, which is significantly associated with the acquisition of virulence factors, appears to be independent of selective forces operating on the core genome.

PubMed Disclaimer

Figures

Figure 1
Figure 1
dS -based heatmap. The heatmap demonstrates a hierarchical cluster analysis of estimated dS (the rate of synonymous distributions between taxa) of 1729 core genes from the 53 E. coli genomes. The differently colored labels designate phylogroups: D (light green), B2 (red), E (green), B1 (blue), and A (dark blue). Groups D, B2 and E consisted predominantly of pathogens; Group A was almost exclusively non-pathogenic, while Group B1 consisted of a mixture of pathogenic and non-pathogenic strains.
Figure 2
Figure 2
dN -based heatmap. The heatmap shows a hierarchical cluster analysis of estimated dN (the rate of non-synonymous distributions between taxa) of 1729 core genes from the 53 E. coli genomes. The differently colored labels designate phylogroups: D (light green), B2 (red), E (green), B1 (blue), and A (dark blue). Groups D, B2 and E consisted predominantly of pathogens; Group A was almost exclusively non-pathogenic, while Group B1 consisted of a mixture of pathogenic and non-pathogenic strains.
Figure 3
Figure 3
Regression plot of strain-wise median dS and dN. The figure shows median dS estimates plotted against median dN estimates for the E. coli strains in the study. The diagonal line designates the estimated regression line. All similar and clonal strains were removed for the regression analysis resulting in a sample size of 36 strains.
Figure 4
Figure 4
dS / dN -based heatmap. The heatmap demonstrates a hierarchical cluster analysis of estimated dS/dN (the rate of synonymous to non-synonymous substitutions between taxa) of 1729 core genes from the 53 E. coli genomes. The differently colored labels designate phylogroups: D (light green), B2 (red), E (green), B1 (blue), and A (dark blue). Groups D, B2 and E consisted predominantly of pathogens; Group A was almost exclusively non-pathogenic, while Group B1 consisted of a mixture of pathogenic and non-pathogenic strains. The horizontal axis of the color key legend indicates multiples of dS to dN, where values close to 1 designates neutrality of selection.
Figure 5
Figure 5
mutT based phylogenic tree. The phylogenic tree is based on alignments of the mutT gene found in the core genome of all 53 E. coli strain. The numbers close to the branches represent bootstrap support. The differently colored labels designate phylogroups: D (light green), B2 (red), E (green), B1 (dark blue), and A (blue). Groups D, B2 and E consisted predominantly of pathogens; Group A was almost exclusively non-pathogenic, while Group B1 consisted of a mixture of pathogenic and non-pathogenic strains.
Figure 6
Figure 6
Core genome relative entropy and AT content. The figure consists of two panels of boxplots displaying the difference between core- and whole genome relative entropy (left), and core- and whole genome fraction of AT content (right) in all 53 E. coli strains.
Figure 7
Figure 7
Statistical analyses of genomic properties in 53 E. coli strains. The figure consists of 4 panels showing different associations between selected genomic properties of pathogenic (red dots) and non-pathogenic (green dots) E. coli strains. The blue line denotes the estimated regression line, which was significant for all panels (p < 0.05). Top left panel shows genomic AT content versus chromosome size, while the top right panel depicts estimated HGT size versus genomic AT content. Bottom right panel designates whole genome relative entropy versus genome size, and bottom left panel shows whole genome relative entropy plotted against genomic fraction of AT.

Similar articles

Cited by

References

    1. Fournier PE, Drancourt M, Raoult D. Bacterial genome sequencing and its use in infectious diseases. Lancet Infect Dis. 2007;7(11):711–723. doi: 10.1016/S1473-3099(07)70260-8. - DOI - PubMed
    1. Pallen MJ, Wren BW. Bacterial pathogenomics. Nature. 2007;449(7164):835–842. doi: 10.1038/nature06248. - DOI - PubMed
    1. McCutcheon JP, Moran NA. Extreme genome reduction in symbiotic bacteria. Nat Rev Microbiol. 2012;10(1):13–26. - PubMed
    1. Moran NA. Microbial minimalism: genome reduction in bacterial pathogens. Cell. 2002;108(5):583–586. doi: 10.1016/S0092-8674(02)00665-7. - DOI - PubMed
    1. Moran NA, McLaughlin HJ, Sorek R. The dynamics and time scale of ongoing genomic erosion in symbiotic bacteria. Science (New York, NY) 2009;323(5912):379–382. doi: 10.1126/science.1167140. - DOI - PubMed

LinkOut - more resources