On preprocessing and antisymmetry in de novo peptide sequencing: improving efficiency and accuracy

J Bioinform Comput Biol. 2008 Jun;6(3):467-92. doi: 10.1142/s0219720008003503.


Peptide sequencing plays a fundamental role in proteomics. Tandem mass spectrometry, being sensitive and efficient, is one of the most commonly used techniques in peptide sequencing. Many computational models and algorithms have been developed for peptide sequencing using tandem mass spectrometry. In this paper, we investigate general issues in de novo sequencing, and present results that can be used to improve current de novo sequencing algorithms. We propose a general preprocessing scheme that performs binning, pseudo-peak introduction, and noise removal, and present theoretical and experimental analyses on each of the components. Then, we study the antisymmetry problem and current assumptions related to it, and propose a more realistic way to handle the antisymmetry problem based on analysis of some datasets. We integrate our findings on preprocessing and the antisymmetry problem with some current models for peptide sequencing. Experimental results show that our findings help to improve accuracies for de novo sequencing.

MeSH terms

  • Amino Acid Sequence*
  • Computational Biology
  • Efficiency, Organizational
  • Molecular Sequence Data
  • Peptides / analysis*
  • Proteomics / methods*
  • Sequence Analysis, Protein / methods*


  • Peptides