How Many Potentially Secreted Proteins Are Contained in a Bacterial Genome?

Gene. 1999 Sep 3;237(1):113-21. doi: 10.1016/s0378-1119(99)00310-8.


Artificial neural networks were trained on the prediction of the subcellular location of bacterial proteins. A cross-validated average prediction accuracy of 93% was reached for distinction between cytoplasmic and non-cytoplasmic proteins, based on the analysis of protein amino-acid composition. Principal component analysis and self-organizing maps were used to create graphical representations of amino-acid sequence space. A clear separation of cytoplasmic, periplasmic, and extracellular proteins was observed. The neural network system was applied to predicting potentially secreted proteins in 15 complete genomes. For mesophile bacteria the predicted fractions of non-cytoplasmic proteins agree with previously published estimates, ranging between 15% and 30%. Characteristics of thermophile genomes might lead to an under-estimation of the fraction of secreted proteins by presently available prediction systems. A self-organizing map was constructed from all 15 bacterial genomes. This technique can reveal additional sequence features independent from exhaustive pair-wise sequence alignment. The Treponema pallidum and Mycobacterium tuberculosis data formed separate clusters indicating unusual characteristics of these genomes.

Publication types

  • Comparative Study

MeSH terms

  • Amino Acid Sequence
  • Amino Acids / analysis*
  • Bacterial Proteins / chemistry
  • Bacterial Proteins / genetics
  • Bacterial Proteins / metabolism*
  • Genome, Bacterial*
  • Models, Biological*
  • Molecular Sequence Data
  • Neural Networks, Computer*
  • Predictive Value of Tests
  • Software


  • Amino Acids
  • Bacterial Proteins