SuperPhy: predictive genomics for the bacterial pathogen Escherichia coli

BMC Microbiol. 2016 Apr 12;16:65. doi: 10.1186/s12866-016-0680-0.


Background: Predictive genomics is the translation of raw genome sequence data into a phenotypic assessment of the organism. For bacterial pathogens, these phenotypes can range from environmental survivability, to the severity of human disease. Significant progress has been made in the development of generic tools for genomic analyses that are broadly applicable to all microorganisms; however, a fundamental missing component is the ability to analyze genomic data in the context of organism-specific phenotypic knowledge, which has been accumulated from decades of research and can provide a meaningful interpretation of genome sequence data.

Results: In this study, we present SuperPhy, an online predictive genomics platform ( ) for Escherichia coli. The platform integrates the analytical tools and genome sequence data for all publicly available E. coli genomes and facilitates the upload of new genome sequences from users under public or private settings. SuperPhy provides real-time analyses of thousands of genome sequences with results that are understandable and useful to a wide community, including those in the fields of clinical medicine, epidemiology, ecology, and evolution. SuperPhy includes identification of: 1) virulence and antimicrobial resistance determinants 2) statistical associations between genotypes, biomarkers, geospatial distribution, host, source, and phylogenetic clade; 3) the identification of biomarkers for groups of genomes on the based presence/absence of specific genomic regions and single-nucleotide polymorphisms and 4) in silico Shiga-toxin subtype.

Conclusions: SuperPhy is a predictive genomics platform that attempts to provide an essential link between the vast amounts of genome information currently being generated and phenotypic knowledge in an organism-specific context.

Keywords: Anti-microbial resistance; Bioinformatics; Comparative genomics; Epidemiology; Population genomics; Software; Virulence factors.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Databases, Nucleic Acid
  • Drug Resistance, Bacterial
  • Escherichia coli / genetics*
  • Genome, Bacterial*
  • Genomics / methods*
  • Phenotype
  • Sequence Analysis, DNA
  • Software
  • Virulence Factors / genetics


  • Virulence Factors