Motivation: Bacteria can usually acquire certain advantageous genes that enable the bacteria to adapt to rapidly changing niches, thereby leading to a wide range of intraspecific genome content and genetic redundancy. The minimal genome of Escherichia coli, which is the most important bacterial species, and the association between E.coli and its human host are worthy of further exploration.
Results: We used gene prediction and phylogenetic analysis to reveal a rich phylogenetic diversity among 491 E.coli strains and to reveal substantial differences between these strains with respect to gene number and genome length. We used pan-genomic analysis to accurately identify 867 core genes, in which only 243 genes are shared by essential genes. This analysis revealed that core genes mainly provide essential functions to the basic lifestyle of E.coli, and accessory genes are likely to confer selective advantages such as niche adaptation or the ability to colonize specific hosts. By association analysis, we found that E.coli strains in non-human hosts may more easily utilize foreign genetic materials to adapt to their surroundings, but the population in human hosts has higher demands for the control of population density, indicating that highly accurate quorum-sensing behavior is very important for harmony between E.coli and its human host. By considering core genes and previous deletions together, we proposed a potential direction for further reduction of the E.coli genome.
Availability and implementation: The data, analysis process and detailed information on software tools used in this study are all available in the supplementary material.
Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: firstname.lastname@example.org.