The analysis of samples from unsequenced and/or understudied species as well as samples where the proteome is derived from multiple organisms poses two key questions. The first is whether the proteomic data obtained from an unusual sample type even contains peptide tandem mass spectra. The second question is whether an appropriate protein sequence database is available for proteomic searches. We describe the use of automated de novo sequencing for evaluating both the quality of a collection of tandem mass spectra and the suitability of a given protein sequence database for searching that data. Applications of this method include the proteome analysis of closely related species, metaproteomics, and proteomics of extinct organisms.
Keywords: Algorithms; Caenorhabditis elegans; data evaluation; de novo sequencing; mass spectrometry; metaproteomics; peptides*; protein identification; quality control and metrics; sequencing ms; tandem mass spectrometry.
© 2020 Johnson et al.