De Novo Sequencing of Peptides from Top-Down Tandem Mass Spectra

J Proteome Res. 2015 Nov 6;14(11):4450-62. doi: 10.1021/pr501244v. Epub 2015 Oct 13.


De novo sequencing of proteins and peptides is one of the most important problems in mass spectrometry-driven proteomics. A variety of methods have been developed to accomplish this task from a set of bottom-up tandem (MS/MS) mass spectra. However, a more recently emerged top-down technology, now gaining more and more popularity, opens new perspectives for protein analysis and characterization, implying a need for efficient algorithms to process this kind of MS/MS data. Here, we describe a method that allows for the retrieval, from a set of top-down MS/MS spectra, of long and accurate sequence fragments of the proteins contained in the sample. To this end, we outline a strategy for generating high-quality sequence tags from top-down spectra, and introduce the concept of a T-Bruijn graph by adapting to the case of tags the notion of an A-Bruijn graph widely used in genomics. The output of the proposed approach represents the set of amino acid strings spelled out by optimal paths in the connected components of a T-Bruijn graph. We illustrate its performance on top-down data sets acquired from carbonic anhydrase 2 (CAH2) and the Fab region of alemtuzumab.

Keywords: T-Bruijn graph; de novo sequencing; top-down mass spectrometry.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alemtuzumab
  • Algorithms*
  • Amino Acid Sequence
  • Animals
  • Antibodies, Monoclonal, Humanized / chemistry
  • Carbonic Anhydrase II / chemistry
  • Cattle
  • Databases, Protein
  • Humans
  • Immunoglobulin Fab Fragments / chemistry
  • Molecular Sequence Data
  • Peptides / chemistry
  • Peptides / isolation & purification*
  • Proteomics / methods
  • Proteomics / statistics & numerical data*
  • Sequence Analysis, Protein / statistics & numerical data*
  • Staining and Labeling / methods
  • Tandem Mass Spectrometry / statistics & numerical data*


  • Antibodies, Monoclonal, Humanized
  • Immunoglobulin Fab Fragments
  • Peptides
  • Alemtuzumab
  • Carbonic Anhydrase II