Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 26 (11), 1885-94

Novor: Real-Time Peptide De Novo Sequencing Software

Affiliations

Novor: Real-Time Peptide De Novo Sequencing Software

Bin Ma. J Am Soc Mass Spectrom.

Abstract

De novo sequencing software has been widely used in proteomics to sequence new peptides from tandem mass spectrometry data. This study presents a new software tool, Novor, to greatly improve both the speed and accuracy of today's peptide de novo sequencing analyses. To improve the accuracy, Novor's scoring functions are based on two large decision trees built from a peptide spectral library with more than 300,000 spectra with machine learning. Important knowledge about peptide fragmentation is extracted automatically from the library and incorporated into the scoring functions. The decision tree model also enables efficient score calculation and contributes to the speed improvement. To further improve the speed, a two-stage algorithmic approach, namely dynamic programming and refinement, is used. The software program was also carefully optimized. On the testing datasets, Novor sequenced 7%-37% more correct residues than the state-of-the-art de novo sequencing tool, PEAKS, while being an order of magnitude faster. Novor can de novo sequence more than 300 MS/MS spectra per second on a laptop computer. The speed surpasses the acquisition speed of today's mass spectrometer and, therefore, opens a new possibility to de novo sequence in real time while the spectrometer is acquiring the spectral data. Graphical Abstract ᅟ.

Keywords: Decision tree; Peptide de novo sequencing; Real time; Software; Tandem mass spectrometry.

Figures

Graphical Abstract
Graphical Abstract
Figure 1
Figure 1
The precision-recall curves of Novor and PEAKS on the four testing datasets, respectively
Figure 2
Figure 2
The maximum recalls of Novor and PEAKS on the four datasets, respectively. The values for the hypothetical verifier are the percentages of verifiable residues in the real peptide sequences
Figure 3
Figure 3
The de novo sequencing speeds (spectra/second) of PEAKS and Novor on a MacBook Pro
Figure 4
Figure 4
A small portion of the decision tree automatically learned by the machine learning algorithm. The tree is drawn upside down, following the computer science convention. The percentage value on each edge is the correctness probability of a residue in a de novo sequence, given the branching conditions on the path from the root to the edge
Figure 5
Figure 5
Another small portion in the middle of the residue score decision tree. Proline, glycine, and serine demonstrate similar effects to the correctness probability after seeing an unusually abundant left b-ion

Similar articles

See all similar articles

Cited by 32 articles

See all "Cited by" articles

References

    1. Viala VL, Hildebrand D, Trusch M, Arni RK, Pimenta DC, Schlüter H, Betzel C, Spencer PJ. ScienceDirect Pseudechis guttatus venom proteome: insights into evolution and toxin clustering. J. Proteom. 2014;110:32–44. doi: 10.1016/j.jprot.2014.07.030. - DOI - PubMed
    1. Alhaider A, Abdelgader AG, Turjoman AA, Newell K, Hunsucker SW, Shan B, Ma B, Gibson DS, Duncan MW. Through the eye of an electrospray needle: mass spectrometric identification of the major peptides and proteins in the milk of the one-humped camel (Camelus dromedarius) J. Mass Spectrom. 2013;48:779–794. doi: 10.1002/jms.3213. - DOI - PubMed
    1. De Costa D, Broodman I, Van Duijn MM, Stingl C, Dekker LJM, Burgers PC, Hoogsteden HC. Smitt, P.aE.S., Van Klaveren, R.J., Luider, T.M.: Sequencing and quantification of IgG fragments and antigen binding regions by mass spectrometry. J. Proteome Res. 2010;9:2937–2945. doi: 10.1021/pr901114w. - DOI - PubMed
    1. Hatano N, Hamada T. Proteome analysis of pitcher fluid of the carnivorous plant Nepenthes alata. J. Proteome Res. 2008;7(2):809–816. doi: 10.1021/pr700566d. - DOI - PubMed
    1. Catusse J, Strub J-M, Job C, Van Dorsselaer A, Job D. Proteome-wide characterization of sugarbeet seed vigor and its tissue specific expression. Proc. Natl. Acad. Sci. U.S.A. 2008;105(29):10262–10267. doi: 10.1073/pnas.0800585105. - DOI - PMC - PubMed

Publication types

Substances

LinkOut - more resources

Feedback