Improving the reliability and throughput of mass spectrometry-based proteomics by spectrum quality filtering

Proteomics. 2006 Apr;6(7):2086-94. doi: 10.1002/pmic.200500309.


In contemporary peptide-centric or non-gel proteome studies, vast amounts of peptide fragmentation data are generated of which only a small part leads to peptide or protein identification. This motivates the development and use of a filtering algorithm that removes spectra that contribute little to protein identification. Removal of unidentifiable spectra reduced both the amount of computational and human time spent on analyzing spectra as well as the chances of obtaining false identifications. Thorough testing on various proteome datasets from different instruments showed that the best suggested machine-learning classifier is, on average, able to recognize half of the unidentified spectra as bad spectra. Further analyses showed that several unidentified spectra classified as good were derived from peptides carrying unanticipated amino acid modifications or contained sequence tags that allowed peptide identification using homology searches. The implementation of the classifiers is available under the GNU General Public License at

Publication types

  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Adult
  • Algorithms
  • Computational Biology
  • Humans
  • Jurkat Cells
  • Peptides / analysis
  • Peptides / chemistry
  • Proteins / analysis*
  • Proteins / chemistry
  • Proteomics / methods*
  • ROC Curve
  • Reproducibility of Results
  • Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization / methods*


  • Peptides
  • Proteins