Assessing Protein Sequence Database Suitability Using De Novo Sequencing

Mol Cell Proteomics. 2020 Jan;19(1):198-208. doi: 10.1074/mcp.TIR119.001752. Epub 2019 Nov 15.


The analysis of samples from unsequenced and/or understudied species as well as samples where the proteome is derived from multiple organisms poses two key questions. The first is whether the proteomic data obtained from an unusual sample type even contains peptide tandem mass spectra. The second question is whether an appropriate protein sequence database is available for proteomic searches. We describe the use of automated de novo sequencing for evaluating both the quality of a collection of tandem mass spectra and the suitability of a given protein sequence database for searching that data. Applications of this method include the proteome analysis of closely related species, metaproteomics, and proteomics of extinct organisms.

Keywords: Algorithms; Caenorhabditis elegans; data evaluation; de novo sequencing; mass spectrometry; metaproteomics; peptides*; protein identification; quality control and metrics; sequencing ms; tandem mass spectrometry.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Animals
  • Caenorhabditis elegans
  • Databases, Protein*
  • Hemiptera
  • Humans
  • K562 Cells
  • Peptides / analysis
  • Proteins / analysis
  • Proteome / analysis*
  • Proteomics / methods*
  • Sequence Analysis, Protein / methods*
  • Skates, Fish
  • Software
  • Tandem Mass Spectrometry / methods*
  • Ursidae


  • Peptides
  • Proteins
  • Proteome