Distinguishable codon usage and amino acid composition patterns among substrates of leaderless secretory pathways from proteobacteria

Appl Microbiol Biotechnol. 2010 Mar;86(1):285-93. doi: 10.1007/s00253-009-2423-8. Epub 2010 Jan 27.


The combined set of codon usage frequencies (61 sense codons) from the 111 annotated sequences of leaderless secreted type I, type III, type IV, and type VI proteins from proteobacteria were subjected to the forward and backward selection to obtain a combination of most effective predictor variables for classification/prediction purposes. The group of 24 codon frequencies displayed a strong discriminatory power with an accuracy of 100% for originally grouped and 97.3 +/- 1.6% for cross-validated (LOOCV) cases and an acceptable error rate (0.062 +/- 0.012) in k-fold (k = 6) cross-validation (KCV). The summary frequencies of synonymous codons for ten amino acids as the alternative predictor variables revealed a comparable discriminatory power (92.8 +/- 2.5% for LOOCV), however at somewhat lower levels of prediction accuracy (0.106 +/- 0.015 of KCV). A number of significant (p < 0.001) differences were found among indices of codon usage and amino acid composition depending on a definite secretion type. About 60% of secretion substrates were characterized as apparently originated from horizontal gene transfer events or putative alien genes and found to be unequally allocated in respect of groups. The proposed prediction approaches could be used to specify secretome proteins from genomic sequences as well as to assess the compatibility between bacterial secretion pathways and secretion substrates.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acids / chemistry*
  • Amino Acids / genetics
  • Bacterial Proteins / chemistry
  • Bacterial Proteins / genetics
  • Bacterial Proteins / metabolism*
  • Base Composition
  • Codon / chemistry
  • Codon / genetics*
  • Computational Biology
  • Discriminant Analysis
  • Proteobacteria* / chemistry
  • Proteobacteria* / genetics
  • Proteobacteria* / metabolism


  • Amino Acids
  • Bacterial Proteins
  • Codon