pNovo+: De Novo Peptide Sequencing Using Complementary HCD and ETD Tandem Mass Spectra

J Proteome Res. 2013 Feb 1;12(2):615-25. doi: 10.1021/pr3006843. Epub 2012 Dec 28.

Abstract

De novo peptide sequencing is the only tool for extracting peptide sequences directly from tandem mass spectrometry (MS) data without any protein database. However, neither the accuracy nor the efficiency of de novo sequencing has been satisfactory, mainly due to incomplete fragmentation information in experimental spectra. Recent advancement in MS technology has enabled acquisition of higher energy collisional dissociation (HCD) and electron transfer dissociation (ETD) spectra of the same precursor. These spectra contain complementary fragmentation information and can be collected with high resolution and high mass accuracy. Taking these advantages, we have developed a new algorithm called pNovo+, which greatly improves the accuracy and speed of de novo sequencing. On tryptic peptides, 86% of the topmost candidate sequences deduced by pNovo+ from HCD + ETD spectral pairs matched the database search results, and the success rate reached 95% if the top three candidates were included, which was much higher than using only HCD (87%) or only ETD spectra (57%). On Asp-N, Glu-C, or Elastase digested peptides, 69-87% of the HCD + ETD spectral pairs were correctly identified by pNovo+ among the topmost candidates, or 84-95% among the top three. On average, it takes pNovo+ only 0.018 s to extract the sequence from a spectrum or spectral pair on a common personal computer. This is more than three times as fast as other de novo sequencing programs. The increase of speed is mainly due to pDAG, a component algorithm of pNovo+. pDAG finds the k longest paths in a directed acyclic graph without the antisymmetry restriction. We have verified that the antisymmetry restriction is unnecessary for high resolution, high mass accuracy data. The extensive use of HCD and ETD spectral information and the pDAG algorithm make pNovo+ an excellent de novo sequencing tool.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Animals
  • Databases, Protein
  • Humans
  • Metalloendopeptidases / chemistry
  • Molecular Sequence Data
  • Pancreatic Elastase / chemistry
  • Peptides / chemistry
  • Peptides / isolation & purification*
  • Sensitivity and Specificity
  • Sequence Analysis, Protein / methods
  • Sequence Analysis, Protein / standards*
  • Serine Endopeptidases / chemistry
  • Tandem Mass Spectrometry / standards*
  • Trypsin / chemistry

Substances

  • Peptides
  • Serine Endopeptidases
  • glutamyl endopeptidase
  • Pancreatic Elastase
  • Trypsin
  • Metalloendopeptidases
  • endoproteinase Asp-N