Comparison of theoretical proteomes: identification of COGs with conserved and variable pI within the multimodal pI distribution

BMC Genomics. 2005 Sep 9:6:116. doi: 10.1186/1471-2164-6-116.

Abstract

Background: Theoretical proteome analysis, generated by plotting theoretical isoelectric points (pI) against molecular masses of all proteins encoded by the genome show a multimodal distribution for pI. This multimodal distribution is an effect of allowed combinations of the charged amino acids, and not due to evolutionary causes. The variation in this distribution can be correlated to the organisms ecological niche. Contributions to this variation maybe mapped to individual proteins by studying the variation in pI of orthologs across microorganism genomes.

Results: The distribution of ortholog pI values showed trimodal distributions for all prokaryotic genomes analyzed, similar to whole proteome plots. Pairwise analysis of pI variation show that a few COGs are conserved within, but most vary between, the acidic and basic regions of the distribution, while molecular mass is more highly conserved. At the level of functional grouping of orthologs, five groups vary significantly from the population of orthologs, which is attributed to either conservation at the level of sequences or a bias for either positively or negatively charged residues contributing to the function. Individual COGs conserved in both the acidic and basic regions of the trimodal distribution are identified, and orthologs that best represent the variation in levels of the acidic and basic regions are listed.

Conclusion: The analysis of pI distribution by using orthologs provides a basis for resolution of theoretical proteome comparison at the level of individual proteins. Orthologs identified that significantly vary between the major acidic and basic regions maybe used as representative of the variation of the entire proteome.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacterial Proteins
  • Cluster Analysis
  • Computational Biology / methods*
  • Computer Simulation
  • Databases, Protein
  • Electrophoresis, Gel, Two-Dimensional
  • Genome, Bacterial*
  • Hydrogen-Ion Concentration
  • Isoelectric Point
  • Models, Statistical
  • Open Reading Frames
  • Proteins / chemistry
  • Proteome*
  • Proteomics / methods*

Substances

  • Bacterial Proteins
  • Proteins
  • Proteome