Sequence database searches via de novo peptide sequencing by tandem mass spectrometry

Rapid Commun Mass Spectrom. 1997;11(9):1067-75. doi: 10.1002/(SICI)1097-0231(19970615)11:9<1067::AID-RCM953>3.0.CO;2-L.


A method is described for searching protein sequence databases using tandem mass spectra of tryptic peptides. The approach uses a de novo sequencing algorithm to derive a short list of possible sequence candidates which serve as query sequences in a subsequent homology-based database search routine. The sequencing algorithm employs a graph theory approach similar to previously described sequencing programs. In addition, amino acid composition, peptide sequence tags and incomplete or ambiguous Edman sequence data can be used to aid in the sequence determinations. Although sequencing of peptides from tandem mass spectra is possible, one of the frequently encountered difficulties is that several alternative sequences can be deduced from one spectrum. Most of the alternative sequences, however, are sufficiently similar for a homology-based sequence database search to be possible. Unfortunately, the available protein sequence database search algorithms (e.g. Blast or FASTA) require a single unambiguous sequence as input. Here we describe how the publicly available FASTA computer program was modified in order to search protein databases more effectively in spite of the ambiguities intrinsic in de novo peptide sequencing algorithms.

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Animals
  • Database Management Systems
  • Databases, Factual*
  • Humans
  • Hydrolysis
  • Mass Spectrometry / instrumentation*
  • Molecular Sequence Data
  • Peptides / analysis*
  • Peptides / chemistry
  • Sequence Homology, Amino Acid
  • Trypsin


  • Peptides
  • Trypsin