Cell compartmentalization serves both the isolation and the specialization of cell functions. After synthesis in the cytoplasm, over a third of all proteins are targeted to other subcellular compartments. Knowing how proteins are distributed within the cell and how they interact is a prerequisite for understanding it as a whole. Surface and secreted proteins are important pathogenicity determinants. Here we present the STEP database (STEPdb) that contains a comprehensive characterization of subcellular localization and topology of the complete proteome of Escherichia coli. Two widely used E. coli proteomes (K-12 and BL21) are presented organized into thirteen subcellular classes. STEPdb exploits the wealth of genetic, proteomic, biochemical, and functional information on protein localization, secretion, and targeting in E. coli, one of the best understood model organisms. Subcellular annotations were derived from a combination of bioinformatics prediction, proteomic, biochemical, functional, topological data and extensive literature re-examination that were refined through manual curation. Strong experimental support for the location of 1553 out of 4303 proteins was based on 426 articles and some experimental indications for another 526. Annotations were provided for another 320 proteins based on firm bioinformatic predictions. STEPdb is the first database that contains an extensive set of peripheral IM proteins (PIM proteins) and includes their graphical visualization into complexes, cellular functions, and interactions. It also summarizes all currently known protein export machineries of E. coli K-12 and pairs them, where available, with the secretory proteins that use them. It catalogs the Sec- and TAT-utilizing secretomes and summarizes their topological features such as signal peptides and transmembrane regions, transmembrane topologies and orientations. It also catalogs physicochemical and structural features that influence topology such as abundance, solubility, disorder, heat resistance, and structural domain families. Finally, STEPdb incorporates prediction tools for topology (TMHMM, SignalP, and Phobius) and disorder (IUPred) and implements the BLAST2STEP that performs protein homology searches against the STEPdb.
© 2014 by The American Society for Biochemistry and Molecular Biology, Inc.