Proteome-wide subcellular topologies of E. coli polypeptides database (STEPdb)

Mol Cell Proteomics. 2014 Dec;13(12):3674-87. doi: 10.1074/mcp.O114.041137. Epub 2014 Sep 10.


Cell compartmentalization serves both the isolation and the specialization of cell functions. After synthesis in the cytoplasm, over a third of all proteins are targeted to other subcellular compartments. Knowing how proteins are distributed within the cell and how they interact is a prerequisite for understanding it as a whole. Surface and secreted proteins are important pathogenicity determinants. Here we present the STEP database (STEPdb) that contains a comprehensive characterization of subcellular localization and topology of the complete proteome of Escherichia coli. Two widely used E. coli proteomes (K-12 and BL21) are presented organized into thirteen subcellular classes. STEPdb exploits the wealth of genetic, proteomic, biochemical, and functional information on protein localization, secretion, and targeting in E. coli, one of the best understood model organisms. Subcellular annotations were derived from a combination of bioinformatics prediction, proteomic, biochemical, functional, topological data and extensive literature re-examination that were refined through manual curation. Strong experimental support for the location of 1553 out of 4303 proteins was based on 426 articles and some experimental indications for another 526. Annotations were provided for another 320 proteins based on firm bioinformatic predictions. STEPdb is the first database that contains an extensive set of peripheral IM proteins (PIM proteins) and includes their graphical visualization into complexes, cellular functions, and interactions. It also summarizes all currently known protein export machineries of E. coli K-12 and pairs them, where available, with the secretory proteins that use them. It catalogs the Sec- and TAT-utilizing secretomes and summarizes their topological features such as signal peptides and transmembrane regions, transmembrane topologies and orientations. It also catalogs physicochemical and structural features that influence topology such as abundance, solubility, disorder, heat resistance, and structural domain families. Finally, STEPdb incorporates prediction tools for topology (TMHMM, SignalP, and Phobius) and disorder (IUPred) and implements the BLAST2STEP that performs protein homology searches against the STEPdb.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cell Compartmentation
  • Computational Biology
  • Databases, Protein*
  • Escherichia coli / classification
  • Escherichia coli / genetics*
  • Escherichia coli / metabolism
  • Escherichia coli / pathogenicity
  • Escherichia coli Proteins / chemistry*
  • Escherichia coli Proteins / genetics
  • Escherichia coli Proteins / metabolism
  • Gene Expression
  • Molecular Sequence Annotation
  • Peptides / chemistry*
  • Peptides / genetics
  • Peptides / metabolism
  • Protein Folding
  • Protein Multimerization
  • Protein Structure, Tertiary
  • Proteome / chemistry*
  • Proteome / genetics
  • Proteome / metabolism
  • Structural Homology, Protein
  • Virulence Factors / chemistry*
  • Virulence Factors / genetics
  • Virulence Factors / metabolism


  • Escherichia coli Proteins
  • Peptides
  • Proteome
  • Virulence Factors