ngLOC: an n-gram-based Bayesian method for estimating the subcellular proteomes of eukaryotes

Genome Biol. 2007;8(5):R68. doi: 10.1186/gb-2007-8-5-r68.


We present a method called ngLOC, an n-gram-based Bayesian classifier that predicts the localization of a protein sequence over ten distinct subcellular organelles. A tenfold cross-validation result shows an accuracy of 89% for sequences localized to a single organelle, and 82% for those localized to multiple organelles. An enhanced version of ngLOC was developed to estimate the subcellular proteomes of eight eukaryotic organisms: yeast, nematode, fruitfly, mosquito, zebrafish, chicken, mouse, and human.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Bayes Theorem*
  • Classification
  • Eukaryotic Cells / ultrastructure*
  • Humans
  • Organelles / chemistry*
  • Proteins / analysis
  • Proteome / analysis*
  • Tissue Distribution
  • Yeasts


  • Proteins
  • Proteome