Modeling and characterization of multi-charge mass spectra for peptide sequencing

J Bioinform Comput Biol. 2006 Dec;4(6):1329-52. doi: 10.1142/s021972000600248x.


Peptide sequencing using tandem mass spectrometry data is an important and challenging problem in proteomics. We address the problem of peptide sequencing for multi-charge spectra. Most peptide sequencing algorithms currently consider only charge one or two ions even for higher-charge spectra. We give a characterization of multi-charge spectra by generalizing existing models. Using our models, we analyzed spectra from Global Proteome Machine (GPM) [Craig R, Cortens JP, Beavis RC, J Proteome Res 3:1234-1242, 2004.] (with charges 1-5), Institute for Systems Biology (ISB) [Keller A, Purvine S, Nesvizhskii AI, Stolyar S, Goodlett DR, Kolker E, OMICS 6:207-212, 2002.] and Orbitrap (both with charges 1-3). Our analysis for the GPM dataset shows that higher charge peaks contribute significantly to prediction of the complete peptide. They also help to explain why existing algorithms do not perform well on multi-charge spectra. Based on these analyses, we claim that peptide sequencing algorithms can achieve higher sensitivity results if they also consider higher charge ions. We verify this claim by proposing a de novo sequencing algorithm called the greedy best strong tag (GBST) algorithm that is simple but considers higher charge ions based on our new model. Evaluation on multi-charge spectra shows that our simple GBST algorithm outperforms Lutefisk and PepNovo, especially for the GPM spectra of charge three or more.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Computer Simulation
  • Mass Spectrometry / methods*
  • Models, Chemical*
  • Molecular Sequence Data
  • Peptide Mapping / methods*
  • Peptides / chemistry*
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein / methods*


  • Peptides